10010010@unknown@formal@none@1@S@
Algorithm
@@@@1@1@@oe@19-8-2009 10010020@unknown@formal@none@1@S@In [[mathematics]], [[computing]], [[linguistics]] and related disciplines, an '''algorithm''' is a sequence of instructions, often used for [[calculation]] and [[data processing]].@@@@1@21@@oe@19-8-2009 10010030@unknown@formal@none@1@S@It is formally a type of [[effective method]] in which a list of well-defined instructions for completing a task will, when given an initial state, proceed through a well-defined series of successive states, eventually terminating in an end-state.@@@@1@38@@oe@19-8-2009 10010040@unknown@formal@none@1@S@The transition from one state to the next is not necessarily [[deterministic]]; some algorithms, known as [[probabilistic algorithms]], incorporate randomness.@@@@1@20@@oe@19-8-2009 10010050@unknown@formal@none@1@S@A partial formalization of the concept began with attempts to solve the [[Entscheidungsproblem]] (the "decision problem") posed by [[David Hilbert]] in 1928.@@@@1@22@@oe@19-8-2009 10010060@unknown@formal@none@1@S@Subsequent formalizations were framed as attempts to define "[[effective calculability]]" (Kleene 1943:274) or "effective method" (Rosser 1939:225); those formalizations included the Gödel-Herbrand-Kleene [[Recursion (computer science)|recursive function]]s of 1930, 1934 and 1935, [[Alonzo Church]]'s [[lambda calculus]] of 1936, [[Emil Post]]'s "Formulation I" of 1936, and [[Alan Turing]]'s [[Turing machines]] of 1936-7 and 1939.@@@@1@52@@oe@19-8-2009 10010070@unknown@formal@none@1@S@==Etymology==@@@@1@1@@oe@19-8-2009 10010080@unknown@formal@none@1@S@[[Muhammad ibn Mūsā al-Khwārizmī|Al-Khwārizmī]], [[Persian people|Persian]] [[astronomer]] and [[mathematician]], wrote a [[treatise]] in [[Arabic]] in 825 AD, ''On Calculation with Hindu Numerals''.@@@@1@22@@oe@19-8-2009 10010090@unknown@formal@none@1@S@(See [[algorism]]).@@@@1@2@@oe@19-8-2009 10010100@unknown@formal@none@1@S@It was translated into [[Latin]] in the 12th century as ''Algoritmi de numero Indorum'' (al-Daffa 1977), which title was likely intended to mean "Algoritmi on the numbers of the Indians", where "Algoritmi" was the translator's rendition of the author's name; but people misunderstanding the title treated ''Algoritmi'' as a Latin plural and this led to the word "algorithm" (Latin ''algorismus'') coming to mean "calculation method".@@@@1@65@@oe@19-8-2009 10010110@unknown@formal@none@1@S@The intrusive "th" is most likely due to a [[false cognate]] with the [[Greek language|Greek]] {{lang|grc|ἀριθμός}} (''arithmos'') meaning "number".@@@@1@19@@oe@19-8-2009 10010120@unknown@formal@none@1@S@== Why algorithms are necessary: an informal definition ==@@@@1@9@@oe@19-8-2009 10010130@unknown@formal@none@1@S@No generally accepted ''formal'' definition of "algorithm" exists yet.@@@@1@9@@oe@19-8-2009 10010140@unknown@formal@none@1@S@An informal definition could be "an algorithm is a computer program that calculates something."@@@@1@14@@oe@19-8-2009 10010150@unknown@formal@none@1@S@For some people, a program is only an algorithm if it stops eventually.@@@@1@13@@oe@19-8-2009 10010160@unknown@formal@none@1@S@For others, a program is only an algorithm if it stops before a given number of calculation steps.@@@@1@18@@oe@19-8-2009 10010170@unknown@formal@none@1@S@A prototypical example of an "algorithm" is Euclid's algorithm to determine the maximum common divisor of two integers greater than one: "subtract the smallest number from the biggest one, repeat until you get a zero or a one".@@@@1@38@@oe@19-8-2009 10010180@unknown@formal@none@1@S@This procedure is know to stop always, and the number of subtractions needed is always smaller than the biggest of the two numbers.@@@@1@23@@oe@19-8-2009 10010190@unknown@formal@none@1@S@We can derive clues to the issues involved and an informal meaning of the word from the following quotation from {{Harvtxt|Boolos|Jeffrey|1974, 1999}} (boldface added):@@@@1@24@@oe@19-8-2009 10010200@unknown@formal@none@1@S@
No human being can write fast enough, or long enough, or small enough to list all members of an enumerably infinite set by writing out their names, one after another, in some notation.@@@@1@34@@oe@19-8-2009 10010210@unknown@formal@none@1@S@But humans can do something equally useful, in the case of certain enumerably infinite sets: They can give '''explicit instructions for determining the nth member of the set''', for arbitrary finite n.@@@@1@32@@oe@19-8-2009 10010220@unknown@formal@none@1@S@Such instructions are to be given quite explicitly, in a form in which '''they could be followed by a computing machine''', or by a '''human who is capable of carrying out only very elementary operations on symbols'''
@@@@1@38@@oe@19-8-2009 10010230@unknown@formal@none@1@S@The words "enumerably infinite" mean "countable using integers perhaps extending to infinity".@@@@1@12@@oe@19-8-2009 10010240@unknown@formal@none@1@S@Thus Boolos and Jeffrey are saying that an algorithm ''implies'' instructions for a process that "creates" output integers from an ''arbitrary'' "input" integer or integers that, in theory, can be chosen from 0 to infinity.@@@@1@35@@oe@19-8-2009 10010250@unknown@formal@none@1@S@Thus we might expect an algorithm to be an algebraic equation such as '''y = m + n''' — two arbitrary "input variables" '''m''' and '''n''' that produce an output '''y'''.@@@@1@31@@oe@19-8-2009 10010260@unknown@formal@none@1@S@As we see in [[Algorithm characterizations]] — the word algorithm implies much more than this, something on the order of (for our addition example):@@@@1@24@@oe@19-8-2009 10010270@unknown@formal@none@1@S@:Precise instructions (in language understood by "the computer") for a "fast, efficient, good" ''process'' that specifies the "moves" of "the computer" (machine or human, equipped with the necessary internally-contained information and capabilities) to find, decode, and then munch arbitrary input integers/symbols '''m''' and '''n''', symbols '''+''' and '''=''' ... and (reliably, correctly, "effectively") produce, in a "reasonable" [[time]], output-integer '''y''' at a specified place and in a specified format.@@@@1@69@@oe@19-8-2009 10010280@unknown@formal@none@1@S@The concept of ''algorithm'' is also used to define the notion of [[decidability (logic)|decidability]].@@@@1@14@@oe@19-8-2009 10010290@unknown@formal@none@1@S@That notion is central for explaining how [[formal system]]s come into being starting from a small set of [[axiom]]s and rules.@@@@1@21@@oe@19-8-2009 10010300@unknown@formal@none@1@S@In [[logic]], the time that an algorithm requires to complete cannot be measured, as it is not apparently related with our customary physical dimension.@@@@1@24@@oe@19-8-2009 10010310@unknown@formal@none@1@S@From such uncertainties, that characterize ongoing work, stems the unavailability of a definition of ''algorithm'' that suits both concrete (in some sense) and abstract usage of the term.@@@@1@28@@oe@19-8-2009 10010320@unknown@formal@none@1@S@:''For a detailed presentation of the various points of view around the definition of "algorithm" see [[Algorithm characterizations]].@@@@1@18@@oe@19-8-2009 10010330@unknown@formal@none@1@S@For examples of simple addition algorithms specified in the detailed manner described in [[Algorithm characterizations]], see [[Algorithm examples]].''@@@@1@18@@oe@19-8-2009 10010340@unknown@formal@none@1@S@== Formalization of algorithms ==@@@@1@5@@oe@19-8-2009 10010350@unknown@formal@none@1@S@Algorithms are essential to the way [[computer]]s process information, because a [[computer program]] is essentially an algorithm that tells the computer what specific steps to perform (in what specific order) in order to carry out a specified task, such as calculating employees’ paychecks or printing students’ report cards.@@@@1@48@@oe@19-8-2009 10010360@unknown@formal@none@1@S@Thus, an algorithm can be considered to be any sequence of operations that can be performed by a [[Turing completeness|Turing-complete]] system.@@@@1@21@@oe@19-8-2009 10010370@unknown@formal@none@1@S@Authors who assert this thesis include Savage (1987) and Gurevich (2000):@@@@1@11@@oe@19-8-2009 10010380@unknown@formal@none@1@S@
...Turing's informal argument in favor of his thesis justifies a stronger thesis: every algorithm can be simulated by a Turing machine (Gurevich 2000:1)...according to Savage [1987], an algorithm is a computational process defined by a Turing machine.@@@@1@38@@oe@19-8-2009 10010390@unknown@formal@none@1@S@(Gurevich 2000:3)
@@@@1@3@@oe@19-8-2009 10010400@unknown@formal@none@1@S@Typically, when an algorithm is associated with processing information, data are read from an input source or device, written to an output sink or device, and/or stored for further processing.@@@@1@30@@oe@19-8-2009 10010410@unknown@formal@none@1@S@Stored data are regarded as part of the internal state of the entity performing the algorithm.@@@@1@16@@oe@19-8-2009 10010420@unknown@formal@none@1@S@In practice, the state is stored in a [[data structure]], but an algorithm requires the internal data only for specific operation sets called [[abstract data type]]s.@@@@1@26@@oe@19-8-2009 10010430@unknown@formal@none@1@S@For any such computational process, the algorithm must be rigorously defined: specified in the way it applies in all possible circumstances that could arise.@@@@1@24@@oe@19-8-2009 10010440@unknown@formal@none@1@S@That is, any conditional steps must be systematically dealt with, case-by-case; the criteria for each case must be clear (and computable).@@@@1@21@@oe@19-8-2009 10010450@unknown@formal@none@1@S@Because an algorithm is a precise list of precise steps, the order of computation will almost always be critical to the functioning of the algorithm.@@@@1@25@@oe@19-8-2009 10010460@unknown@formal@none@1@S@Instructions are usually assumed to be listed explicitly, and are described as starting "from the top" and going "down to the bottom", an idea that is described more formally by ''[[control flow|flow of control]]''.@@@@1@34@@oe@19-8-2009 10010470@unknown@formal@none@1@S@So far, this discussion of the formalization of an algorithm has assumed the premises of [[imperative programming]].@@@@1@17@@oe@19-8-2009 10010480@unknown@formal@none@1@S@This is the most common conception, and it attempts to describe a task in discrete, "mechanical" means.@@@@1@17@@oe@19-8-2009 10010490@unknown@formal@none@1@S@Unique to this conception of formalized algorithms is the [[assignment operation]], setting the value of a variable.@@@@1@17@@oe@19-8-2009 10010500@unknown@formal@none@1@S@It derives from the intuition of "[[memory]]" as a scratchpad.@@@@1@10@@oe@19-8-2009 10010510@unknown@formal@none@1@S@There is an example below of such an assignment.@@@@1@9@@oe@19-8-2009 10010520@unknown@formal@none@1@S@For some alternate conceptions of what constitutes an algorithm see [[functional programming]] and [[logic programming]] .@@@@1@16@@oe@19-8-2009 10010530@unknown@formal@none@1@S@=== Termination ===@@@@1@3@@oe@19-8-2009 10010540@unknown@formal@none@1@S@Some writers restrict the definition of ''algorithm'' to procedures that eventually finish.@@@@1@12@@oe@19-8-2009 10010550@unknown@formal@none@1@S@In such a category Kleene places the "''decision procedure'' or ''decision method'' or ''algorithm'' for the question" (Kleene 1952:136).@@@@1@19@@oe@19-8-2009 10010560@unknown@formal@none@1@S@Others, including Kleene, include procedures that could run forever without stopping; such a procedure has been called a "computational method" (Knuth 1997:5) or "''calculation procedure'' or ''algorithm''" (Kleene 1952:137); however, Kleene notes that such a method must eventually exhibit "some object" (Kleene 1952:137).@@@@1@43@@oe@19-8-2009 10010570@unknown@formal@none@1@S@Minsky makes the pertinent observation, in regards to determining whether an algorithm will eventually terminate (from a particular starting state):@@@@1@20@@oe@19-8-2009 10010580@unknown@formal@none@1@S@
But if the length of the process is not known in advance, then "trying" it may not be decisive, because if the process does go on forever — then at no time will we ever be sure of the answer (Minsky 1967:105).
@@@@1@43@@oe@19-8-2009 10010590@unknown@formal@none@1@S@As it happens, no other method can do any better, as was shown by [[Alan Turing]] with his celebrated result on the undecidability of the so-called [[halting problem]].@@@@1@28@@oe@19-8-2009 10010600@unknown@formal@none@1@S@There is no algorithmic procedure for determining of arbitrary algorithms whether or not they terminate from given starting states.@@@@1@19@@oe@19-8-2009 10010610@unknown@formal@none@1@S@The analysis of algorithms for their likelihood of termination is called [[termination analysis]].@@@@1@13@@oe@19-8-2009 10010620@unknown@formal@none@1@S@See the examples of (im-)"proper" subtraction at [[partial function]] for more about what can happen when an algorithm fails for certain of its input numbers — e.g., (i) non-termination, (ii) production of "junk" (output in the wrong format to be considered a number) or no number(s) at all (halt ends the computation with no output), (iii) wrong number(s), or (iv) a combination of these.@@@@1@64@@oe@19-8-2009 10010630@unknown@formal@none@1@S@Kleene proposed that the production of "junk" or failure to produce a number is solved by having the algorithm detect these instances and produce e.g., an error message (he suggested "0"), or preferably, force the algorithm into an endless loop (Kleene 1952:322).@@@@1@42@@oe@19-8-2009 10010640@unknown@formal@none@1@S@Davis does this to his subtraction algorithm — he fixes his algorithm in a second example so that it is proper subtraction (Davis 1958:12-15).@@@@1@24@@oe@19-8-2009 10010650@unknown@formal@none@1@S@Along with the logical outcomes "true" and "false" Kleene also proposes the use of a third logical symbol "u" — undecided (Kleene 1952:326) — thus an algorithm will always produce ''something'' when confronted with a "proposition".@@@@1@36@@oe@19-8-2009 10010660@unknown@formal@none@1@S@The problem of wrong answers must be solved with an independent "proof" of the algorithm e.g., using induction:@@@@1@18@@oe@19-8-2009 10010670@unknown@formal@none@1@S@
We normally require auxiliary evidence for this (that the algorithm correctly defines a [[mu recursive function]]), e.g., in the form of an inductive proof that, for each argument value, the computation terminates with a unique value (Minsky 1967:186).
@@@@1@39@@oe@19-8-2009 10010680@unknown@formal@none@1@S@=== Expressing algorithms ===@@@@1@4@@oe@19-8-2009 10010690@unknown@formal@none@1@S@Algorithms can be expressed in many kinds of notation, including [[natural language]]s, [[pseudocode]], [[flowchart]]s, and [[programming language]]s.@@@@1@17@@oe@19-8-2009 10010700@unknown@formal@none@1@S@Natural language expressions of algorithms tend to be verbose and ambiguous, and are rarely used for complex or technical algorithms.@@@@1@20@@oe@19-8-2009 10010710@unknown@formal@none@1@S@Pseudocode and flowcharts are structured ways to express algorithms that avoid many of the ambiguities common in natural language statements, while remaining independent of a particular implementation language.@@@@1@28@@oe@19-8-2009 10010720@unknown@formal@none@1@S@Programming languages are primarily intended for expressing algorithms in a form that can be executed by a [[computer]], but are often used as a way to define or document algorithms.@@@@1@30@@oe@19-8-2009 10010730@unknown@formal@none@1@S@There is a wide variety of representations possible and one can express a given [[Turing machine]] program as a sequence of machine tables (see more at [[finite state machine]] and [[state transition table]]), as flowcharts (see more at [[state diagram]]), or as a form of rudimentary [[machine code]] or [[assembly code]] called "sets of quadruples" (see more at [[Turing machine]]).@@@@1@60@@oe@19-8-2009 10010740@unknown@formal@none@1@S@Sometimes it is helpful in the description of an algorithm to supplement small "flow charts" (state diagrams) with natural-language and/or arithmetic expressions written inside "[[block diagram]]s" to summarize what the "flow charts" are accomplishing.@@@@1@34@@oe@19-8-2009 10010750@unknown@formal@none@1@S@Representations of algorithms are generally classed into three accepted levels of Turing machine description (Sipser 2006:157):@@@@1@16@@oe@19-8-2009 10010760@unknown@formal@none@1@S@*'''1 High-level description''':@@@@1@3@@oe@19-8-2009 10010770@unknown@formal@none@1@S@:: "...prose to describe an algorithm, ignoring the implementation details.@@@@1@10@@oe@19-8-2009 10010780@unknown@formal@none@1@S@At this level we do not need to mention how the machine manages its tape or head"@@@@1@17@@oe@19-8-2009 10010790@unknown@formal@none@1@S@*'''2 Implementation description''':@@@@1@3@@oe@19-8-2009 10010800@unknown@formal@none@1@S@:: "...prose used to define the way the Turing machine uses its head and the way that it stores data on its tape.@@@@1@23@@oe@19-8-2009 10010810@unknown@formal@none@1@S@At this level we do not give details of states or transition function"@@@@1@13@@oe@19-8-2009 10010820@unknown@formal@none@1@S@*'''3 Formal description''':@@@@1@3@@oe@19-8-2009 10010830@unknown@formal@none@1@S@:: Most detailed, "lowest level", gives the Turing machine's "state table".@@@@1@11@@oe@19-8-2009 10010840@unknown@formal@none@1@S@:''For an example of the simple algorithm "Add m+n" described in all three levels see [[Algorithm examples]].''@@@@1@17@@oe@19-8-2009 10010850@unknown@formal@none@1@S@=== Implementation ===@@@@1@3@@oe@19-8-2009 10010860@unknown@formal@none@1@S@Most algorithms are intended to be implemented as [[computer programs]].@@@@1@10@@oe@19-8-2009 10010870@unknown@formal@none@1@S@However, algorithms are also implemented by other means, such as in a biological [[neural network]] (for example, the [[human brain]] implementing [[arithmetic]] or an insect looking for food), in an [[electrical circuit]], or in a mechanical device.@@@@1@37@@oe@19-8-2009 10010880@unknown@formal@none@1@S@== Example ==@@@@1@3@@oe@19-8-2009 10010890@unknown@formal@none@1@S@One of the simplest algorithms is to find the largest number in an (unsorted) list of numbers.@@@@1@17@@oe@19-8-2009 10010900@unknown@formal@none@1@S@The solution necessarily requires looking at every number in the list, but only once at each.@@@@1@16@@oe@19-8-2009 10010910@unknown@formal@none@1@S@From this follows a simple algorithm, which can be stated in a high-level description [[English language|English]] prose, as:@@@@1@18@@oe@19-8-2009 10010920@unknown@formal@none@1@S@'''High-level description:'''@@@@1@2@@oe@19-8-2009 10010930@unknown@formal@none@1@S@# Assume the first item is largest.@@@@1@7@@oe@19-8-2009 10010940@unknown@formal@none@1@S@# Look at each of the remaining items in the list and if it is larger than the largest item so far, make a note of it.@@@@1@27@@oe@19-8-2009 10010950@unknown@formal@none@1@S@# The last noted item is the largest in the list when the process is complete.@@@@1@16@@oe@19-8-2009 10010960@unknown@formal@none@1@S@'''(Quasi-)formal description:''' Written in prose but much closer to the high-level language of a computer program, the following is the more formal coding of the algorithm in [[pseudocode]] or [[pidgin code]]:@@@@1@31@@oe@19-8-2009 10010970@unknown@formal@none@1@S@Input: A non-empty list of numbers ''L''.@@@@1@7@@oe@19-8-2009 10010980@unknown@formal@none@1@S@Output: The ''largest'' number in the list ''L''. ''largest'' ← ''L''0 '''for each''' ''item'' '''in''' the list ''L≥1'', '''do''' '''if''' the ''item'' > ''largest'', '''then''' ''largest'' ← the ''item'' '''return''' ''largest''@@@@1@31@@oe@19-8-2009 10010990@unknown@formal@none@1@S@For a more complex example of an algorithm, see [[Euclid's algorithm]] for the [[greatest common divisor]], one of the earliest algorithms known.@@@@1@22@@oe@19-8-2009 10011000@unknown@formal@none@1@S@=== Algorithm analysis ===@@@@1@4@@oe@19-8-2009 10011010@unknown@formal@none@1@S@As it happens, it is important to know how much of a particular resource (such as time or storage) is required for a given algorithm.@@@@1@25@@oe@19-8-2009 10011020@unknown@formal@none@1@S@Methods have been developed for the [[analysis of algorithms]] to obtain such quantitative answers; for example, the algorithm above has a time requirement of O(''n''), using the [[big O notation]] with ''n'' as the length of the list.@@@@1@38@@oe@19-8-2009 10011030@unknown@formal@none@1@S@At all times the algorithm only needs to remember two values: the largest number found so far, and its current position in the input list.@@@@1@25@@oe@19-8-2009 10011040@unknown@formal@none@1@S@Therefore it is said to have a space requirement of ''O(1)'', if the space required to store the input numbers is not counted, or O (log ''n'') if it is counted.@@@@1@31@@oe@19-8-2009 10011050@unknown@formal@none@1@S@Different algorithms may complete the same task with a different set of instructions in less or more time, space, or effort than others.@@@@1@23@@oe@19-8-2009 10011060@unknown@formal@none@1@S@For example, given two different recipes for making potato salad, one may have ''peel the potato'' before ''boil the potato'' while the other presents the steps in the reverse order, yet they both call for these steps to be repeated for all potatoes and end when the potato salad is ready to be eaten.@@@@1@54@@oe@19-8-2009 10011070@unknown@formal@none@1@S@The [[analysis of algorithms|analysis and study of algorithms]] is a discipline of [[computer science]], and is often practiced abstractly without the use of a specific [[programming language]] or implementation.@@@@1@29@@oe@19-8-2009 10011080@unknown@formal@none@1@S@In this sense, algorithm analysis resembles other mathematical disciplines in that it focuses on the underlying properties of the algorithm and not on the specifics of any particular implementation.@@@@1@29@@oe@19-8-2009 10011090@unknown@formal@none@1@S@Usually [[pseudocode]] is used for analysis as it is the simplest and most general representation.@@@@1@15@@oe@19-8-2009 10011100@unknown@formal@none@1@S@== Classes ==@@@@1@3@@oe@19-8-2009 10011110@unknown@formal@none@1@S@There are various ways to classify algorithms, each with its own merits.@@@@1@12@@oe@19-8-2009 10011120@unknown@formal@none@1@S@=== Classification by implementation ===@@@@1@5@@oe@19-8-2009 10011130@unknown@formal@none@1@S@One way to classify algorithms is by implementation means.@@@@1@9@@oe@19-8-2009 10011140@unknown@formal@none@1@S@* '''Recursion''' or '''iteration''': A [[recursive algorithm]] is one that invokes (makes reference to) itself repeatedly until a certain condition matches, which is a method common to [[functional programming]].@@@@1@29@@oe@19-8-2009 10011150@unknown@formal@none@1@S@[[Iteration|Iterative]] algorithms use repetitive constructs like [[Control flow#Loops|loops]] and sometimes additional data structures like [[Stack (data structure)|stacks]] to solve the given problems.@@@@1@22@@oe@19-8-2009 10011160@unknown@formal@none@1@S@Some problems are naturally suited for one implementation or the other.@@@@1@11@@oe@19-8-2009 10011170@unknown@formal@none@1@S@For example, [[towers of hanoi]] is well understood in recursive implementation.@@@@1@11@@oe@19-8-2009 10011180@unknown@formal@none@1@S@Every recursive version has an equivalent (but possibly more or less complex) iterative version, and vice versa.@@@@1@17@@oe@19-8-2009 10011190@unknown@formal@none@1@S@* '''Logical''': An algorithm may be viewed as controlled [[Deductive reasoning|logical deduction]].@@@@1@12@@oe@19-8-2009 10011200@unknown@formal@none@1@S@This notion may be expressed as: '''Algorithm = logic + control''' (Kowalski 1979).@@@@1@13@@oe@19-8-2009 10011210@unknown@formal@none@1@S@The logic component expresses the axioms that may be used in the computation and the control component determines the way in which deduction is applied to the axioms.@@@@1@28@@oe@19-8-2009 10011220@unknown@formal@none@1@S@This is the basis for the [[logic programming]] paradigm.@@@@1@9@@oe@19-8-2009 10011230@unknown@formal@none@1@S@In pure logic programming languages the control component is fixed and algorithms are specified by supplying only the logic component.@@@@1@20@@oe@19-8-2009 10011240@unknown@formal@none@1@S@The appeal of this approach is the elegant [[Formal semantics of programming languages|semantics]]: a change in the axioms has a well defined change in the algorithm.@@@@1@26@@oe@19-8-2009 10011250@unknown@formal@none@1@S@* '''Serial''' or '''parallel''' or '''distributed''': Algorithms are usually discussed with the assumption that computers execute one instruction of an algorithm at a time.@@@@1@24@@oe@19-8-2009 10011260@unknown@formal@none@1@S@Those computers are sometimes called serial computers.@@@@1@7@@oe@19-8-2009 10011270@unknown@formal@none@1@S@An algorithm designed for such an environment is called a serial algorithm, as opposed to [[parallel algorithm]]s or [[distributed algorithms]].@@@@1@20@@oe@19-8-2009 10011280@unknown@formal@none@1@S@Parallel algorithms take advantage of computer architectures where several processors can work on a problem at the same time, whereas distributed algorithms utilize multiple machines connected with a [[Computer Network|network]].@@@@1@30@@oe@19-8-2009 10011290@unknown@formal@none@1@S@Parallel or distributed algorithms divide the problem into more symmetrical or asymmetrical subproblems and collect the results back together.@@@@1@19@@oe@19-8-2009 10011300@unknown@formal@none@1@S@The resource consumption in such algorithms is not only processor cycles on each processor but also the communication overhead between the processors.@@@@1@22@@oe@19-8-2009 10011310@unknown@formal@none@1@S@Sorting algorithms can be parallelized efficiently, but their communication overhead is expensive.@@@@1@12@@oe@19-8-2009 10011320@unknown@formal@none@1@S@Iterative algorithms are generally parallelizable.@@@@1@5@@oe@19-8-2009 10011330@unknown@formal@none@1@S@Some problems have no parallel algorithms, and are called inherently serial problems.@@@@1@12@@oe@19-8-2009 10011340@unknown@formal@none@1@S@* '''Deterministic''' or '''non-deterministic''': [[Deterministic algorithm]]s solve the problem with exact decision at every step of the algorithm whereas [[non-deterministic algorithm]] solve problems via guessing although typical guesses are made more accurate through the use of [[heuristics]].@@@@1@37@@oe@19-8-2009 10011350@unknown@formal@none@1@S@* '''Exact''' or '''approximate''': While many algorithms reach an exact solution, [[approximation algorithm]]s seek an approximation that is close to the true solution.@@@@1@23@@oe@19-8-2009 10011360@unknown@formal@none@1@S@Approximation may use either a deterministic or a random strategy.@@@@1@10@@oe@19-8-2009 10011370@unknown@formal@none@1@S@Such algorithms have practical value for many hard problems.@@@@1@9@@oe@19-8-2009 10011380@unknown@formal@none@1@S@=== Classification by design paradigm ===@@@@1@6@@oe@19-8-2009 10011390@unknown@formal@none@1@S@Another way of classifying algorithms is by their design methodology or paradigm.@@@@1@12@@oe@19-8-2009 10011400@unknown@formal@none@1@S@There is a certain number of paradigms, each different from the other.@@@@1@12@@oe@19-8-2009 10011410@unknown@formal@none@1@S@Furthermore, each of these categories will include many different types of algorithms.@@@@1@12@@oe@19-8-2009 10011420@unknown@formal@none@1@S@Some commonly found paradigms include:@@@@1@5@@oe@19-8-2009 10011430@unknown@formal@none@1@S@* '''Divide and conquer'''.@@@@1@4@@oe@19-8-2009 10011440@unknown@formal@none@1@S@A [[divide and conquer algorithm]] repeatedly reduces an instance of a problem to one or more smaller instances of the same problem (usually [[recursion|recursively]]), until the instances are small enough to solve easily.@@@@1@33@@oe@19-8-2009 10011450@unknown@formal@none@1@S@One such example of divide and conquer is [[mergesort|merge sorting]].@@@@1@10@@oe@19-8-2009 10011460@unknown@formal@none@1@S@Sorting can be done on each segment of data after dividing data into segments and sorting of entire data can be obtained in conquer phase by merging them.@@@@1@28@@oe@19-8-2009 10011470@unknown@formal@none@1@S@A simpler variant of divide and conquer is called '''decrease and conquer algorithm''', that solves an identical subproblem and uses the solution of this subproblem to solve the bigger problem.@@@@1@30@@oe@19-8-2009 10011480@unknown@formal@none@1@S@Divide and conquer divides the problem into multiple subproblems and so conquer stage will be more complex than decrease and conquer algorithms.@@@@1@22@@oe@19-8-2009 10011490@unknown@formal@none@1@S@An example of decrease and conquer algorithm is [[binary search algorithm]].@@@@1@11@@oe@19-8-2009 10011500@unknown@formal@none@1@S@* '''[[Dynamic programming]]'''.@@@@1@3@@oe@19-8-2009 10011510@unknown@formal@none@1@S@When a problem shows [[optimal substructure]], meaning the optimal solution to a problem can be constructed from optimal solutions to subproblems, and [[overlapping subproblems]], meaning the same subproblems are used to solve many different problem instances, a quicker approach called ''dynamic programming'' avoids recomputing solutions that have already been computed.@@@@1@50@@oe@19-8-2009 10011520@unknown@formal@none@1@S@For example, the shortest path to a goal from a vertex in a weighted [[graph (mathematics)|graph]] can be found by using the shortest path to the goal from all adjacent vertices.@@@@1@31@@oe@19-8-2009 10011530@unknown@formal@none@1@S@Dynamic programming and [[memoization]] go together.@@@@1@6@@oe@19-8-2009 10011540@unknown@formal@none@1@S@The main difference between dynamic programming and divide and conquer is that subproblems are more or less independent in divide and conquer, whereas subproblems overlap in dynamic programming.@@@@1@28@@oe@19-8-2009 10011550@unknown@formal@none@1@S@The difference between dynamic programming and straightforward recursion is in caching or memoization of recursive calls.@@@@1@16@@oe@19-8-2009 10011560@unknown@formal@none@1@S@When subproblems are independent and there is no repetition, memoization does not help; hence dynamic programming is not a solution for all complex problems.@@@@1@24@@oe@19-8-2009 10011570@unknown@formal@none@1@S@By using memoization or maintaining a [[Mathematical table|table]] of subproblems already solved, dynamic programming reduces the exponential nature of many problems to polynomial complexity.@@@@1@24@@oe@19-8-2009 10011580@unknown@formal@none@1@S@* '''The greedy method'''.@@@@1@4@@oe@19-8-2009 10011590@unknown@formal@none@1@S@A [[greedy algorithm]] is similar to a [[dynamic programming|dynamic programming algorithm]], but the difference is that solutions to the subproblems do not have to be known at each stage; instead a "greedy" choice can be made of what looks best for the moment.@@@@1@43@@oe@19-8-2009 10011600@unknown@formal@none@1@S@The greedy method extends the solution with the best possible decision (not all feasible decisions) at an algorithmic stage based on the current local optimum and the best decision (not all possible decisions) made in previous stage.@@@@1@37@@oe@19-8-2009 10011610@unknown@formal@none@1@S@It is not exhaustive, and does not give accurate answer to many problems.@@@@1@13@@oe@19-8-2009 10011620@unknown@formal@none@1@S@But when it works, it will be the fastest method.@@@@1@10@@oe@19-8-2009 10011630@unknown@formal@none@1@S@The most popular greedy algorithm is finding the minimal spanning tree as given by [[kruskal's algorithm|Kruskal]].@@@@1@16@@oe@19-8-2009 10011640@unknown@formal@none@1@S@* '''Linear programming'''.@@@@1@3@@oe@19-8-2009 10011650@unknown@formal@none@1@S@When solving a problem using [[linear programming]], specific [[inequality|inequalities]] involving the inputs are found and then an attempt is made to maximize (or minimize) some linear function of the inputs.@@@@1@30@@oe@19-8-2009 10011660@unknown@formal@none@1@S@Many problems (such as the [[Maximum flow problem|maximum flow]] for directed [[graph (mathematics)|graphs]]) can be stated in a linear programming way, and then be solved by a 'generic' algorithm such as the [[simplex algorithm]].@@@@1@34@@oe@19-8-2009 10011670@unknown@formal@none@1@S@A more complex variant of linear programming is called integer programming, where the solution space is restricted to the [[integers]].@@@@1@20@@oe@19-8-2009 10011680@unknown@formal@none@1@S@* '''[[Reduction (complexity)|Reduction]]'''.@@@@1@3@@oe@19-8-2009 10011690@unknown@formal@none@1@S@This technique involves solving a difficult problem by transforming it into a better known problem for which we have (hopefully) [[asymptotically optimal]] algorithms.@@@@1@23@@oe@19-8-2009 10011700@unknown@formal@none@1@S@The goal is to find a reducing algorithm whose [[Computational complexity theory|complexity]] is not dominated by the resulting reduced algorithm's.@@@@1@20@@oe@19-8-2009 10011710@unknown@formal@none@1@S@For example, one [[selection algorithm]] for finding the median in an unsorted list involves first sorting the list (the expensive portion) and then pulling out the middle element in the sorted list (the cheap portion).@@@@1@35@@oe@19-8-2009 10011720@unknown@formal@none@1@S@This technique is also known as ''transform and conquer''.@@@@1@9@@oe@19-8-2009 10011730@unknown@formal@none@1@S@* '''Search and enumeration'''.@@@@1@4@@oe@19-8-2009 10011740@unknown@formal@none@1@S@Many problems (such as playing [[chess]]) can be modeled as problems on [[graph theory|graphs]].@@@@1@14@@oe@19-8-2009 10011750@unknown@formal@none@1@S@A [[graph exploration algorithm]] specifies rules for moving around a graph and is useful for such problems.@@@@1@17@@oe@19-8-2009 10011760@unknown@formal@none@1@S@This category also includes [[search algorithm]]s, [[branch and bound]] enumeration and [[backtracking]].@@@@1@12@@oe@19-8-2009 10011770@unknown@formal@none@1@S@* '''The probabilistic and heuristic paradigm'''.@@@@1@6@@oe@19-8-2009 10011780@unknown@formal@none@1@S@Algorithms belonging to this class fit the definition of an algorithm more loosely.@@@@1@13@@oe@19-8-2009 10011790@unknown@formal@none@1@S@# [[Probabilistic algorithm]]s are those that make some choices randomly (or pseudo-randomly); for some problems, it can in fact be proven that the fastest solutions must involve some [[randomness]].@@@@1@29@@oe@19-8-2009 10011800@unknown@formal@none@1@S@# [[Genetic algorithm]]s attempt to find solutions to problems by mimicking biological [[evolution]]ary processes, with a cycle of random mutations yielding successive generations of "solutions".@@@@1@25@@oe@19-8-2009 10011810@unknown@formal@none@1@S@Thus, they emulate reproduction and "survival of the fittest".@@@@1@9@@oe@19-8-2009 10011820@unknown@formal@none@1@S@In [[genetic programming]], this approach is extended to algorithms, by regarding the algorithm itself as a "solution" to a problem.@@@@1@20@@oe@19-8-2009 10011830@unknown@formal@none@1@S@# [[Heuristic]] algorithms, whose general purpose is not to find an optimal solution, but an approximate solution where the time or resources are limited.@@@@1@24@@oe@19-8-2009 10011840@unknown@formal@none@1@S@They are not practical to find perfect solutions.@@@@1@8@@oe@19-8-2009 10011850@unknown@formal@none@1@S@An example of this would be [[local search (optimization)|local search]], [[tabu search]], or [[simulated annealing]] algorithms, a class of heuristic probabilistic algorithms that vary the solution of a problem by a random amount.@@@@1@33@@oe@19-8-2009 10011860@unknown@formal@none@1@S@The name "[[simulated annealing]]" alludes to the metallurgic term meaning the heating and cooling of metal to achieve freedom from defects.@@@@1@21@@oe@19-8-2009 10011870@unknown@formal@none@1@S@The purpose of the random variance is to find close to globally optimal solutions rather than simply locally optimal ones, the idea being that the random element will be decreased as the algorithm settles down to a solution.@@@@1@38@@oe@19-8-2009 10011880@unknown@formal@none@1@S@=== Classification by field of study ===@@@@1@7@@oe@19-8-2009 10011890@unknown@formal@none@1@S@Every field of science has its own problems and needs efficient algorithms.@@@@1@12@@oe@19-8-2009 10011900@unknown@formal@none@1@S@Related problems in one field are often studied together.@@@@1@9@@oe@19-8-2009 10011910@unknown@formal@none@1@S@Some example classes are [[search algorithm]]s, [[sorting algorithm]]s, [[merge algorithm]]s, [[numerical analysis|numerical algorithms]], [[graph theory|graph algorithms]], [[string algorithms]], [[computational geometry|computational geometric algorithms]], [[combinatorial|combinatorial algorithms]], [[machine learning]], [[cryptography]], [[data compression]] algorithms and [[parsing|parsing techniques]].@@@@1@33@@oe@19-8-2009 10011920@unknown@formal@none@1@S@Fields tend to overlap with each other, and algorithm advances in one field may improve those of other, sometimes completely unrelated, fields.@@@@1@22@@oe@19-8-2009 10011930@unknown@formal@none@1@S@For example, dynamic programming was originally invented for optimization of resource consumption in industry, but is now used in solving a broad range of problems in many fields.@@@@1@28@@oe@19-8-2009 10011940@unknown@formal@none@1@S@=== Classification by complexity ===@@@@1@5@@oe@19-8-2009 10011950@unknown@formal@none@1@S@Algorithms can be classified by the amount of time they need to complete compared to their input size.@@@@1@18@@oe@19-8-2009 10011960@unknown@formal@none@1@S@There is a wide variety: some algorithms complete in linear time relative to input size, some do so in an exponential amount of time or even worse, and some never halt.@@@@1@31@@oe@19-8-2009 10011970@unknown@formal@none@1@S@Additionally, some problems may have multiple algorithms of differing complexity, while other problems might have no algorithms or no known efficient algorithms.@@@@1@22@@oe@19-8-2009 10011980@unknown@formal@none@1@S@There are also mappings from some problems to other problems.@@@@1@10@@oe@19-8-2009 10011990@unknown@formal@none@1@S@Owing to this, it was found to be more suitable to classify the problems themselves instead of the algorithms into equivalence classes based on the complexity of the best possible algorithms for them.@@@@1@33@@oe@19-8-2009 10012000@unknown@formal@none@1@S@=== Classification by computing power ===@@@@1@6@@oe@19-8-2009 10012010@unknown@formal@none@1@S@Another way to classify algorithms is by computing power.@@@@1@9@@oe@19-8-2009 10012020@unknown@formal@none@1@S@This is typically done by considering some collection (class) of algorithms.@@@@1@11@@oe@19-8-2009 10012030@unknown@formal@none@1@S@A recursive class of algorithms is one that includes algorithms for all Turing computable functions.@@@@1@15@@oe@19-8-2009 10012040@unknown@formal@none@1@S@Looking at classes of algorithms allows for the possibility of restricting the available computational resources (time and memory) used in a computation.@@@@1@22@@oe@19-8-2009 10012050@unknown@formal@none@1@S@A subrecursive class of algorithms is one in which not all Turing computable functions can be obtained.@@@@1@17@@oe@19-8-2009 10012060@unknown@formal@none@1@S@For example, the algorithms that run in [[P (complexity)|polynomial time]] suffice for many important types of computation but do not exhaust all Turing computable functions.@@@@1@25@@oe@19-8-2009 10012070@unknown@formal@none@1@S@The class algorithms implemented by [[primitive recursive function]]s is another subrecursive class.@@@@1@12@@oe@19-8-2009 10012080@unknown@formal@none@1@S@Burgin (2005, p. 24) uses a generalized definition of algorithms that relaxes the common requirement that the output of the algorithm that computes a function must be determined after a finite number of steps.@@@@1@34@@oe@19-8-2009 10012090@unknown@formal@none@1@S@He defines a super-recursive class of algorithms as "a class of algorithms in which it is possible to compute functions not computable by any Turing machine" (Burgin 2005, p. 107).@@@@1@30@@oe@19-8-2009 10012100@unknown@formal@none@1@S@This is closely related to the study of methods of [[hypercomputation]].@@@@1@11@@oe@19-8-2009 10012110@unknown@formal@none@1@S@== Legal issues ==@@@@1@4@@oe@19-8-2009 10012120@unknown@formal@none@1@S@:''See also: [[Software patents]] for a general overview of the patentability of software, including computer-implemented algorithms.''@@@@1@16@@oe@19-8-2009 10012130@unknown@formal@none@1@S@Algorithms, by themselves, are not usually patentable.@@@@1@7@@oe@19-8-2009 10012140@unknown@formal@none@1@S@In the [[United States]], a claim consisting solely of simple manipulations of abstract concepts, numbers, or signals do not constitute "processes" (USPTO 2006) and hence algorithms are not patentable (as in [[Gottschalk v. Benson]]).@@@@1@34@@oe@19-8-2009 10012150@unknown@formal@none@1@S@However, practical applications of algorithms are sometimes patentable.@@@@1@8@@oe@19-8-2009 10012160@unknown@formal@none@1@S@For example, in [[Diamond v. Diehr]], the application of a simple [[feedback]] algorithm to aid in the curing of [[synthetic rubber]] was deemed patentable.@@@@1@24@@oe@19-8-2009 10012170@unknown@formal@none@1@S@The [[Software patent debate|patenting of software]] is highly controversial, and there are highly criticized patents involving algorithms, especially [[data compression]] algorithms, such as [[Unisys]]' [[Graphics Interchange Format#Unisys and LZW patent enforcement|LZW patent]].@@@@1@32@@oe@19-8-2009 10012180@unknown@formal@none@1@S@Additionally, some cryptographic algorithms have export restrictions (see [[export of cryptography]]).@@@@1@11@@oe@19-8-2009 10012190@unknown@formal@none@1@S@== History: Development of the notion of "algorithm" ==@@@@1@9@@oe@19-8-2009 10012200@unknown@formal@none@1@S@=== Origin of the word ===@@@@1@6@@oe@19-8-2009 10012210@unknown@formal@none@1@S@The word ''algorithm'' comes from the name of the 9th century [[Persian people|Persian]] mathematician [[al-Khwarizmi|Abu Abdullah Muhammad ibn Musa al-Khwarizmi]] whose works introduced Indian numerals and algebraic concepts.@@@@1@28@@oe@19-8-2009 10012220@unknown@formal@none@1@S@He worked in [[Baghdad]] at the time when it was the centre of scientific studies and trade.@@@@1@17@@oe@19-8-2009 10012230@unknown@formal@none@1@S@The word ''[[algorism]]'' originally referred only to the rules of performing [[arithmetic]] using [[Hindu-Arabic numeral system|Arabic numerals]] but evolved via European Latin translation of al-Khwarizmi's name into ''algorithm'' by the 18th century.@@@@1@32@@oe@19-8-2009 10012240@unknown@formal@none@1@S@The word evolved to include all definite procedures for solving problems or performing tasks.@@@@1@14@@oe@19-8-2009 10012250@unknown@formal@none@1@S@=== Discrete and distinguishable symbols ===@@@@1@6@@oe@19-8-2009 10012260@unknown@formal@none@1@S@'''Tally-marks''': To keep track of their flocks, their sacks of grain and their money the ancients used tallying: accumulating stones or marks scratched on sticks, or making discrete symbols in clay.@@@@1@31@@oe@19-8-2009 10012270@unknown@formal@none@1@S@Through the Babylonian and Egyptian use of marks and symbols, eventually [[Roman numerals]] and the [[abacus]] evolved (Dilson, p.16–41).@@@@1@19@@oe@19-8-2009 10012280@unknown@formal@none@1@S@Tally marks appear prominently in [[unary numeral system]] arithmetic used in [[Turing machine]] and [[Post-Turing machine]] computations.@@@@1@17@@oe@19-8-2009 10012290@unknown@formal@none@1@S@=== Manipulation of symbols as "place holders" for numbers: algebra ===@@@@1@11@@oe@19-8-2009 10012300@unknown@formal@none@1@S@The work of the Ancient Greek geometers, Persian mathematician [[Al-Khwarizmi]] (often considered as the "father of [[algebra]]"), and Western European mathematicians culminated in [[Leibniz]]'s notion of the [[calculus ratiocinator]] (ca 1680):@@@@1@31@@oe@19-8-2009 10012310@unknown@formal@none@1@S@:"A good century and a half ahead of his time, Leibniz proposed an algebra of logic, an algebra that would specify the rules for manipulating logical concepts in the manner that ordinary algebra specifies the rules for manipulating numbers" (Davis 2000:1)@@@@1@41@@oe@19-8-2009 10012320@unknown@formal@none@1@S@=== Mechanical contrivances with discrete states ===@@@@1@7@@oe@19-8-2009 10012330@unknown@formal@none@1@S@'''The clock''': Bolter credits the invention of the weight-driven [[clock]] as “The key invention [of Europe in the Middle Ages]", in particular the [[verge escapement]]< (Bolter 1984:24) that provides us with the tick and tock of a mechanical clock.@@@@1@39@@oe@19-8-2009 10012340@unknown@formal@none@1@S@“The accurate automatic machine” (Bolter 1984:26) led immediately to "mechanical [[automata]]" beginning in the thirteenth century and finally to “computational machines" – the [[difference engine]] and [[analytical engine]]s of [[Charles Babbage]] and Countess [[Ada Lovelace]] (Bolter p.33–34, p.204–206).@@@@1@38@@oe@19-8-2009 10012350@unknown@formal@none@1@S@'''Jacquard loom, Hollerith punch cards, telegraphy and telephony — the electromechanical relay''': Bell and Newell (1971) indicate that the [[Jacquard loom]] (1801), precursor to [[Hollerith cards]] (punch cards, 1887), and “telephone switching technologies” were the roots of a tree leading to the development of the first computers (Bell and Newell diagram p. 39, cf Davis 2000).@@@@1@56@@oe@19-8-2009 10012360@unknown@formal@none@1@S@By the mid-1800s the [[telegraph]], the precursor of the telephone, was in use throughout the world, its discrete and distinguishable encoding of letters as “dots and dashes” a common sound.@@@@1@30@@oe@19-8-2009 10012370@unknown@formal@none@1@S@By the late 1800s the [[ticker tape]] (ca 1870s) was in use, as was the use of [[Hollerith cards]] in the 1890 U.S. census.@@@@1@24@@oe@19-8-2009 10012380@unknown@formal@none@1@S@Then came the [[Teletype]] (ca 1910) with its punched-paper use of [[Baudot code]] on tape.@@@@1@15@@oe@19-8-2009 10012390@unknown@formal@none@1@S@Telephone-switching networks of electromechanical [[relay]]s (invented 1835) was behind the work of [[George Stibitz]] (1937), the inventor of the digital adding device.@@@@1@22@@oe@19-8-2009 10012400@unknown@formal@none@1@S@As he worked in Bell Laboratories, he observed the “burdensome’ use of mechanical calculators with gears.@@@@1@16@@oe@19-8-2009 10012410@unknown@formal@none@1@S@"He went home one evening in 1937 intending to test his idea....@@@@1@12@@oe@19-8-2009 10012420@unknown@formal@none@1@S@When the tinkering was over, Stibitz had constructed a binary adding device".@@@@1@12@@oe@19-8-2009 10012430@unknown@formal@none@1@S@(Valley News, p. 13).@@@@1@4@@oe@19-8-2009 10012440@unknown@formal@none@1@S@Davis (2000) observes the particular importance of the electromechanical relay (with its two "binary states" ''open'' and ''closed''):@@@@1@18@@oe@19-8-2009 10012450@unknown@formal@none@1@S@: It was only with the development, beginning in the 1930s, of electromechanical calculators using electrical relays, that machines were built having the scope Babbage had envisioned."@@@@1@27@@oe@19-8-2009 10012460@unknown@formal@none@1@S@(Davis, p. 14).@@@@1@3@@oe@19-8-2009 10012470@unknown@formal@none@1@S@=== Mathematics during the 1800s up to the mid-1900s ===@@@@1@10@@oe@19-8-2009 10012480@unknown@formal@none@1@S@'''Symbols and rules''': In rapid succession the mathematics of [[George Boole]] (1847, 1854), [[Gottlob Frege]] (1879), and [[Giuseppe Peano]] (1888–1889) reduced arithmetic to a sequence of symbols manipulated by rules.@@@@1@30@@oe@19-8-2009 10012490@unknown@formal@none@1@S@Peano's ''The principles of arithmetic, presented by a new method'' (1888) was "the first attempt at an axiomatization of mathematics in a symbolic language" (van Heijenoort:81ff).@@@@1@26@@oe@19-8-2009 10012500@unknown@formal@none@1@S@But Heijenoort gives Frege (1879) this kudos: Frege’s is "perhaps the most important single work ever written in logic. ... in which we see a " 'formula language', that is a ''lingua characterica'', a language written with special symbols, "for pure thought", that is, free from rhetorical embellishments ... constructed from specific symbols that are manipulated according to definite rules" (van Heijenoort:1).@@@@1@62@@oe@19-8-2009 10012510@unknown@formal@none@1@S@The work of Frege was further simplified and amplified by [[Alfred North Whitehead]] and [[Bertrand Russell]] in their [[Principia Mathematica]] (1910–1913).@@@@1@21@@oe@19-8-2009 10012520@unknown@formal@none@1@S@'''The paradoxes''': At the same time a number of disturbing paradoxes appeared in the literature, in particular the [[Burali-Forti paradox]] (1897), the [[Russell paradox]] (1902–03), and the [[Richard Paradox]] (Dixon 1906, cf Kleene 1952:36–40).@@@@1@34@@oe@19-8-2009 10012530@unknown@formal@none@1@S@The resultant considerations led to [[Kurt Gödel]]’s paper (1931) — he specifically cites the paradox of the liar — that completely reduces rules of [[recursion]] to numbers.@@@@1@27@@oe@19-8-2009 10012540@unknown@formal@none@1@S@'''Effective calculability''': In an effort to solve the [[Entscheidungsproblem]] defined precisely by Hilbert in 1928, mathematicians first set about to define what was meant by an "effective method" or "effective calculation" or "effective calculability" (i.e., a calculation that would succeed).@@@@1@40@@oe@19-8-2009 10012550@unknown@formal@none@1@S@In rapid succession the following appeared: [[Alonzo Church]], [[Stephen Kleene]] and [[J.B. Rosser]]'s [[λ-calculus]], (cf footnote in [[Alonzo Church]] 1936a:90, 1936b:110) a finely-honed definition of "general recursion" from the work of Gödel acting on suggestions of [[Jacques Herbrand]] (cf Gödel's Princeton lectures of 1934) and subsequent simplifications by Kleene (1935-6:237ff, 1943:255ff). Church's proof (1936:88ff) that the [[Entscheidungsproblem]] was unsolvable, [[Emil Post]]'s definition of effective calculability as a worker mindlessly following a list of instructions to move left or right through a sequence of rooms and while there either mark or erase a paper or observe the paper and make a yes-no decision about the next instruction (cf "Formulation I", Post 1936:289-290).@@@@1@111@@oe@19-8-2009 10012560@unknown@formal@none@1@S@[[Alan Turing]]'s proof of that the Entscheidungsproblem was unsolvable by use of his "a- [automatic-] machine"(Turing 1936-7:116ff) -- in effect almost identical to Post's "formulation", [[J. Barkley Rosser]]'s definition of "effective method" in terms of "a machine" (Rosser 1939:226).@@@@1@39@@oe@19-8-2009 10012570@unknown@formal@none@1@S@[[S. C. Kleene]]'s proposal of a precursor to "[[Church thesis]]" that he called "Thesis I" (Kleene 1943:273–274), and a few years later Kleene's renaming his Thesis "Church's Thesis" (Kleene 1952:300, 317) and proposing "Turing's Thesis" (Kleene 1952:376).@@@@1@37@@oe@19-8-2009 10012580@unknown@formal@none@1@S@=== Emil Post (1936) and Alan Turing (1936-7, 1939)===@@@@1@9@@oe@19-8-2009 10012590@unknown@formal@none@1@S@Here is a remarkable coincidence of two men not knowing each other but describing a process of men-as-computers working on computations — and they yield virtually identical definitions.@@@@1@28@@oe@19-8-2009 10012600@unknown@formal@none@1@S@[[Emil Post]] (1936) described the actions of a "computer" (human being) as follows:@@@@1@13@@oe@19-8-2009 10012610@unknown@formal@none@1@S@:"...two concepts are involved: that of a ''symbol space'' in which the work leading from problem to answer is to be carried out, and a fixed unalterable ''set of directions''.@@@@1@30@@oe@19-8-2009 10012620@unknown@formal@none@1@S@His symbol space would be@@@@1@5@@oe@19-8-2009 10012630@unknown@formal@none@1@S@:"a two way infinite sequence of spaces or boxes...@@@@1@9@@oe@19-8-2009 10012640@unknown@formal@none@1@S@The problem solver or worker is to move and work in this symbol space, being capable of being in, and operating in but one box at a time.... a box is to admit of but two possible conditions, i.e., being empty or unmarked, and having a single mark in it, say a vertical stroke.@@@@1@54@@oe@19-8-2009 10012650@unknown@formal@none@1@S@:"One box is to be singled out and called the starting point. ...a specific problem is to be given in symbolic form by a finite number of boxes [i.e., INPUT] being marked with a stroke.@@@@1@35@@oe@19-8-2009 10012660@unknown@formal@none@1@S@Likewise the answer [i.e., OUTPUT] is to be given in symbolic form by such a configuration of marked boxes....@@@@1@19@@oe@19-8-2009 10012670@unknown@formal@none@1@S@:"A set of directions applicable to a general problem sets up a deterministic process when applied to each specific problem.@@@@1@20@@oe@19-8-2009 10012680@unknown@formal@none@1@S@This process will terminate only when it comes to the direction of type (C ) [i.e., STOP]." (U p. 289–290)@@@@1@20@@oe@19-8-2009 10012685@unknown@formal@none@1@S@See more at [[Post-Turing machine]]@@@@1@5@@oe@19-8-2009 10012690@unknown@formal@none@1@S@[[Alan Turing]]’s work (1936, 1939:160) preceded that of Stibitz (1937); it is unknown whether Stibitz knew of the work of Turing.@@@@1@21@@oe@19-8-2009 10012700@unknown@formal@none@1@S@Turing’s biographer believed that Turing’s use of a typewriter-like model derived from a youthful interest: “Alan had dreamt of inventing typewriters as a boy; Mrs. Turing had a typewriter; and he could well have begun by asking himself what was meant by calling a typewriter 'mechanical'" (Hodges, p. 96).@@@@1@49@@oe@19-8-2009 10012710@unknown@formal@none@1@S@Given the prevalence of Morse code and telegraphy, ticker tape machines, and Teletypes we might conjecture that all were influences.@@@@1@20@@oe@19-8-2009 10012720@unknown@formal@none@1@S@Turing — his model of computation is now called a [[Turing machine]] — begins, as did Post, with an analysis of a human computer that he whittles down to a simple set of basic motions and "states of mind".@@@@1@39@@oe@19-8-2009 10012730@unknown@formal@none@1@S@But he continues a step further and creates a machine as a model of computation of numbers (Turing 1936-7:116).@@@@1@19@@oe@19-8-2009 10012740@unknown@formal@none@1@S@:"Computing is normally done by writing certain symbols on paper.@@@@1@10@@oe@19-8-2009 10012750@unknown@formal@none@1@S@We may suppose this paper is divided into squares like a child's arithmetic book....I assume then that the computation is carried out on one-dimensional paper, i.e., on a tape divided into squares.@@@@1@32@@oe@19-8-2009 10012760@unknown@formal@none@1@S@I shall also suppose that the number of symbols which may be printed is finite....@@@@1@15@@oe@19-8-2009 10012770@unknown@formal@none@1@S@:"The behavior of the computer at any moment is determined by the symbols which he is observing, and his "state of mind" at that moment.@@@@1@25@@oe@19-8-2009 10012780@unknown@formal@none@1@S@We may suppose that there is a bound B to the number of symbols or squares which the computer can observe at one moment.@@@@1@24@@oe@19-8-2009 10012790@unknown@formal@none@1@S@If he wishes to observe more, he must use successive observations.@@@@1@11@@oe@19-8-2009 10012800@unknown@formal@none@1@S@We will also suppose that the number of states of mind which need be taken into account is finite...@@@@1@19@@oe@19-8-2009 10012810@unknown@formal@none@1@S@:"Let us imagine that the operations performed by the computer to be split up into 'simple operations' which are so elementary that it is not easy to imagine them further divided" (Turing 1936-7:136).@@@@1@33@@oe@19-8-2009 10012820@unknown@formal@none@1@S@Turing's reduction yields the following:@@@@1@5@@oe@19-8-2009 10012830@unknown@formal@none@1@S@:"The simple operations must therefore include:@@@@1@6@@oe@19-8-2009 10012840@unknown@formal@none@1@S@::"(a) Changes of the symbol on one of the observed squares@@@@1@11@@oe@19-8-2009 10012850@unknown@formal@none@1@S@::"(b) Changes of one of the squares observed to another square within L squares of one of the previously observed squares.@@@@1@21@@oe@19-8-2009 10012860@unknown@formal@none@1@S@"It may be that some of these change necessarily invoke a change of state of mind.@@@@1@16@@oe@19-8-2009 10012870@unknown@formal@none@1@S@The most general single operation must therefore be taken to be one of the following:@@@@1@15@@oe@19-8-2009 10012880@unknown@formal@none@1@S@::"(A) A possible change (a) of symbol together with a possible change of state of mind.@@@@1@16@@oe@19-8-2009 10012890@unknown@formal@none@1@S@::"(B) A possible change (b) of observed squares, together with a possible change of state of mind"@@@@1@17@@oe@19-8-2009 10012900@unknown@formal@none@1@S@:"We may now construct a machine to do the work of this computer."@@@@1@13@@oe@19-8-2009 10012910@unknown@formal@none@1@S@(Turing 1936-7:136)@@@@1@2@@oe@19-8-2009 10012920@unknown@formal@none@1@S@A few years later, Turing expanded his analysis (thesis, definition) with this forceful expression of it:@@@@1@16@@oe@19-8-2009 10012930@unknown@formal@none@1@S@:"A function is said to be "effectively calculable" if its values can be found by some purely mechanical process.@@@@1@19@@oe@19-8-2009 10012940@unknown@formal@none@1@S@Although it is fairly easy to get an intuitive grasp of this idea, it is neverthessless desirable to have some more definite, mathematical expressible definition . . . [he discusses the history of the definition pretty much as presented above with respect to Gödel, Herbrand, Kleene, Church, Turing and Post] . . .@@@@1@53@@oe@19-8-2009 10012950@unknown@formal@none@1@S@We may take this statement literally, understanding by a purely mechanical process one which could be carried out by a machine.@@@@1@21@@oe@19-8-2009 10012960@unknown@formal@none@1@S@It is possible to give a mathematical description, in a certain normal form, of the structures of these machines.@@@@1@19@@oe@19-8-2009 10012970@unknown@formal@none@1@S@The development of these ideas leads to the author's definition of a computable function, and to an identification of computability † with effective calculability . . . .@@@@1@28@@oe@19-8-2009 10012980@unknown@formal@none@1@S@::"† We shall use the expression "computable function" to mean a function calculable by a machine, and we let "effectively calculabile" refer to the intuitive idea without particular identification with any one of these definitions."(Turing 1939:160)@@@@1@36@@oe@19-8-2009 10012990@unknown@formal@none@1@S@=== J. B. Rosser (1939) and S. C. Kleene (1943) ===@@@@1@11@@oe@19-8-2009 10013000@unknown@formal@none@1@S@'''[[J. Barkley Rosser]]''' boldly defined an ‘effective [mathematical] method’ in the following manner (boldface added):@@@@1@15@@oe@19-8-2009 10013010@unknown@formal@none@1@S@:"'Effective method' is used here in the rather special sense of a method each step of which is precisely determined and which is certain to produce the answer in a finite number of steps.@@@@1@34@@oe@19-8-2009 10013020@unknown@formal@none@1@S@With this special meaning, three different precise definitions have been given to date. [his footnote #5; see discussion immediately below].@@@@1@20@@oe@19-8-2009 10013030@unknown@formal@none@1@S@The simplest of these to state (due to Post and Turing) says essentially that '''an effective method of solving certain sets of problems exists if one can build a machine which will then solve any problem of the set with no human intervention beyond inserting the question and (later) reading the answer'''.@@@@1@52@@oe@19-8-2009 10013040@unknown@formal@none@1@S@All three definitions are equivalent, so it doesn't matter which one is used.@@@@1@13@@oe@19-8-2009 10013050@unknown@formal@none@1@S@Moreover, the fact that all three are equivalent is a very strong argument for the correctness of any one."@@@@1@19@@oe@19-8-2009 10013060@unknown@formal@none@1@S@(Rosser 1939:225–6)@@@@1@2@@oe@19-8-2009 10013070@unknown@formal@none@1@S@Rosser's footnote #5 references the work of (1) Church and Kleene and their definition of λ-definability, in particular Church's use of it in his ''An Unsolvable Problem of Elementary Number Theory'' (1936); (2) Herbrand and Gödel and their use of recursion in particular Gödel's use in his famous paper ''On Formally Undecidable Propositions of Principia Mathematica and Related Systems I'' (1931); and (3) Post (1936) and Turing (1936-7) in their mechanism-models of computation.@@@@1@73@@oe@19-8-2009 10013080@unknown@formal@none@1@S@'''[[Stephen C. Kleene]]''' defined as his now-famous "Thesis I" known as the [[Church-Turing thesis]].@@@@1@14@@oe@19-8-2009 10013090@unknown@formal@none@1@S@But he did this in the following context (boldface in original):@@@@1@11@@oe@19-8-2009 10013100@unknown@formal@none@1@S@:"12.@@@@1@1@@oe@19-8-2009 10013110@unknown@formal@none@1@S@'''Algorithmic theories'''...@@@@1@2@@oe@19-8-2009 10013120@unknown@formal@none@1@S@In setting up a complete algorithmic theory, what we do is to describe a procedure, performable for each set of values of the independent variables, which procedure necessarily terminates and in such manner that from the outcome we can read a definite answer, "yes" or "no," to the question, "is the predicate value true?”"@@@@1@54@@oe@19-8-2009 10013130@unknown@formal@none@1@S@(Kleene 1943:273)@@@@1@2@@oe@19-8-2009 10013140@unknown@formal@none@1@S@=== History after 1950 ===@@@@1@5@@oe@19-8-2009 10013150@unknown@formal@none@1@S@A number of efforts have been directed toward further refinement of the definition of "algorithm", and activity is on-going because of issues surrounding, in particular, [[foundations of mathematics]] (especially the [[Church-Turing Thesis]]) and [[philosophy of mind]] (especially arguments around [[artificial intelligence]]).@@@@1@41@@oe@19-8-2009 10013160@unknown@formal@none@1@S@For more, see [[Algorithm characterizations]].@@@@1@5@@oe@19-8-2009 10013170@unknown@formal@none@1@S@==Algorithmic Repositories==@@@@1@2@@oe@19-8-2009 10013180@unknown@formal@none@1@S@*LEDA@@@@1@1@@oe@19-8-2009 10013190@unknown@formal@none@1@S@*Stanford GraphBase@@@@1@2@@oe@19-8-2009 10013200@unknown@formal@none@1@S@*Combinatorica@@@@1@1@@oe@19-8-2009 10013210@unknown@formal@none@1@S@*Netlib@@@@1@1@@oe@19-8-2009 10013220@unknown@formal@none@1@S@*XTango@@@@1@1@@oe@19-8-2009 10020010@unknown@formal@none@1@S@
Ambiguity
@@@@1@1@@oe@19-8-2009 10020020@unknown@formal@none@1@S@'''Ambiguity''' is the property of being '''ambiguous''', where a [[word]], term, notation, sign, [[symbol]], [[phrase]], [[Sentence (linguistics)|sentence]], or any other form used for [[communication]], is called ambiguous if it can be interpreted in more than one way.@@@@1@37@@oe@19-8-2009 10020030@unknown@formal@none@1@S@Ambiguity is distinct from ''[[vagueness]]'', which arises when the boundaries of meaning are indistinct.@@@@1@14@@oe@19-8-2009 10020040@unknown@formal@none@1@S@Ambiguity is context-dependent: the same communication may be ambiguous in one context and unambiguous in another context.@@@@1@17@@oe@19-8-2009 10020050@unknown@formal@none@1@S@For a word, ambiguity typically refers to an unclear choice between different definitions as may be found in a [[dictionary]].@@@@1@20@@oe@19-8-2009 10020060@unknown@formal@none@1@S@A sentence may be ambiguous due to different ways of [[parsing]] the same sequence of words.@@@@1@16@@oe@19-8-2009 10020070@unknown@formal@none@1@S@== Linguistic forms ==@@@@1@4@@oe@19-8-2009 10020080@unknown@formal@none@1@S@'''[[Polysemy|Lexical ambiguity]]''' arises when [[context]] is insufficient to determine the sense of a single word that has more than one meaning.@@@@1@21@@oe@19-8-2009 10020090@unknown@formal@none@1@S@For example, the word “bank” has several distinct definitions, including “financial institution” and “edge of a river,” but if someone says “I deposited $100 in the bank,” most people would not think you used a shovel to dig in the mud.@@@@1@41@@oe@19-8-2009 10020100@unknown@formal@none@1@S@The word "run" has 130 ambiguous definitions in some [[lexicon]]s.@@@@1@10@@oe@19-8-2009 10020110@unknown@formal@none@1@S@"Biweekly" can mean "fortnightly" (once every two weeks - 26 times a year), OR "twice a week" (104 times a year).@@@@1@21@@oe@19-8-2009 10020120@unknown@formal@none@1@S@Stating a specific context like "meeting schedule" does NOT disambiguate "biweekly."@@@@1@11@@oe@19-8-2009 10020130@unknown@formal@none@1@S@Many people believe that such lexically-ambiguous, miscommunication-prone words should be avoided altogether, since the user generally has to waste time, effort, and [[attention span]] to define what is meant when they are used.@@@@1@33@@oe@19-8-2009 10020140@unknown@formal@none@1@S@The use of multi-defined words requires the author or speaker to clarify their context, and sometimes elaborate on their specific intended meaning (in which case, a less ambiguous term should have been used).@@@@1@33@@oe@19-8-2009 10020150@unknown@formal@none@1@S@The goal of clear concise communication is that the receiver(s) have no misunderstanding about what was meant to be conveyed.@@@@1@20@@oe@19-8-2009 10020160@unknown@formal@none@1@S@An exception to this could include a politician whose "wiggle words" and [[obfuscation]] are necessary to gain support from multiple [[constituent (politics)]] with [[mutually exclusive]] conflicting desires from their candidate of choice.@@@@1@32@@oe@19-8-2009 10020170@unknown@formal@none@1@S@Ambiguity is a powerful tool of [[political science]].@@@@1@8@@oe@19-8-2009 10020180@unknown@formal@none@1@S@More problematic are words whose senses express closely-related concepts.@@@@1@9@@oe@19-8-2009 10020190@unknown@formal@none@1@S@“Good,” for example, can mean “useful” or “functional” (''That’s a good hammer''), “exemplary” (''She’s a good student''), “pleasing” (''This is good soup''), “moral” (''a good person'' versus ''the lesson to be learned from a story''), "[[righteous]]", etc.@@@@1@37@@oe@19-8-2009 10020200@unknown@formal@none@1@S@“I have a good daughter” is not clear about which sense is intended.@@@@1@13@@oe@19-8-2009 10020210@unknown@formal@none@1@S@The various ways to apply [[prefix]]es and [[suffix]]es can also create ambiguity (“unlockable” can mean “capable of being unlocked” or “impossible to lock”, and therefore should not be used).@@@@1@29@@oe@19-8-2009 10020220@unknown@formal@none@1@S@'''[[Syntactic ambiguity]]''' arises when a sentence can be [[parsing|parsed]] in more than one way.@@@@1@14@@oe@19-8-2009 10020230@unknown@formal@none@1@S@“He ate the cookies on the couch,” for example, could mean that he ate those cookies which were on the couch (as opposed to those that were on the table), or it could mean that he was sitting on the couch when he ate the cookies.@@@@1@46@@oe@19-8-2009 10020240@unknown@formal@none@1@S@[[Spoken language]] can contain many more types of ambiguities, where there is more than one way to compose a set of sounds into words, for example “ice cream” and “I scream.”@@@@1@31@@oe@19-8-2009 10020250@unknown@formal@none@1@S@Such ambiguity is generally resolved based on the context.@@@@1@9@@oe@19-8-2009 10020260@unknown@formal@none@1@S@A mishearing of such, based on incorrectly-resolved ambiguity, is called a [[mondegreen]].@@@@1@12@@oe@19-8-2009 10020270@unknown@formal@none@1@S@'''[[Meaning (non-linguistic)|Semantic ambiguity]]''' arises when a word or concept has an inherently diffuse meaning based on widespread or informal usage.@@@@1@20@@oe@19-8-2009 10020280@unknown@formal@none@1@S@This is often the case, for example, with idiomatic expressions whose definitions are rarely or never well-defined, and are presented in the context of a larger argument that invites a conclusion.@@@@1@31@@oe@19-8-2009 10020290@unknown@formal@none@1@S@For example, “You could do with a new automobile.@@@@1@9@@oe@19-8-2009 10020300@unknown@formal@none@1@S@How about a test drive?”@@@@1@5@@oe@19-8-2009 10020310@unknown@formal@none@1@S@The clause “You could do with” presents a statement with such wide possible interpretation as to be essentially meaningless.@@@@1@19@@oe@19-8-2009 10020320@unknown@formal@none@1@S@Lexical ambiguity is contrasted with semantic ambiguity.@@@@1@7@@oe@19-8-2009 10020330@unknown@formal@none@1@S@The former represents a choice between a finite number of known and meaningful context-dependent interpretations.@@@@1@15@@oe@19-8-2009 10020340@unknown@formal@none@1@S@The latter represents a choice between any number of possible interpretations, none of which may have a standard agreed-upon meaning.@@@@1@20@@oe@19-8-2009 10020350@unknown@formal@none@1@S@This form of ambiguity is closely related to [[vagueness]].@@@@1@9@@oe@19-8-2009 10020360@unknown@formal@none@1@S@Linguistic ambiguity can be a problem in law (see [[Ambiguity (law)]]), because the interpretation of written documents and oral agreements is often of paramount importance.@@@@1@25@@oe@19-8-2009 10020370@unknown@formal@none@1@S@==Intentional application==@@@@1@2@@oe@19-8-2009 10020380@unknown@formal@none@1@S@[[Philosopher]]s (and other users of [[logic]]) spend a lot of time and effort searching for and removing (or intentionally adding) ambiguity in arguments, because it can lead to incorrect conclusions and can be used to deliberately conceal bad arguments.@@@@1@39@@oe@19-8-2009 10020390@unknown@formal@none@1@S@For example, a politician might say “I oppose taxes that hinder economic growth.”@@@@1@13@@oe@19-8-2009 10020400@unknown@formal@none@1@S@Some will think he opposes taxes in general, because they hinder economic growth.@@@@1@13@@oe@19-8-2009 10020410@unknown@formal@none@1@S@Others may think he opposes only those taxes that he believes will hinder economic growth (although in writing, the correct insertion or omission of a [[comma (punctuation)|comma]] after “taxes” and the use of "which" can help reduce ambiguity here.@@@@1@39@@oe@19-8-2009 10020420@unknown@formal@none@1@S@For the first meaning, “, which” is properly used in place of “that”), or restructure the sentence to completely eliminate possible misinterpretation.@@@@1@22@@oe@19-8-2009 10020430@unknown@formal@none@1@S@The devious politician hopes that each [[constituent (politics)]] will interpret the above statement in the most desirable way, and think the politician supports everyone's opinion.@@@@1@25@@oe@19-8-2009 10020440@unknown@formal@none@1@S@However, the opposite can also be true - An opponent can turn a positive statement into a bad one, if the speaker uses ambiguity (intentionally or not).@@@@1@27@@oe@19-8-2009 10020450@unknown@formal@none@1@S@The logical fallacies of [[amphiboly]] and [[equivocation]] rely heavily on the use of ambiguous words and phrases.@@@@1@17@@oe@19-8-2009 10020460@unknown@formal@none@1@S@In [[literature]] and [[rhetoric]], on the other hand, ambiguity can be a useful tool.@@@@1@14@@oe@19-8-2009 10020470@unknown@formal@none@1@S@[[Groucho Marx]]’s classic joke depends on a grammatical ambiguity for its [[humor]], for example: “Last night I shot an elephant in my pajamas.@@@@1@23@@oe@19-8-2009 10020480@unknown@formal@none@1@S@What he was doing in my pajamas I’ll never know.”@@@@1@10@@oe@19-8-2009 10020490@unknown@formal@none@1@S@Ambiguity can also be used as a comic device through a genuine intention to confuse, as does Magic: The Gathering's Unhinged © Ambiguity, which makes puns with [[homophone]]s, mispunctuation, and run-ons: “Whenever a player plays a spell that counters a spell that has been played[,] or a player plays a spell that comes into play with counters, that player may counter the next spell played[,] or put an additional counter on a permanent that has already been played, but not countered.”@@@@1@81@@oe@19-8-2009 10020500@unknown@formal@none@1@S@Songs and poetry often rely on ambiguous words for artistic effect, as in the song title “Don’t It Make My Brown Eyes Blue” (where “blue” can refer to the color, or to [[sadness]]).@@@@1@33@@oe@19-8-2009 10020510@unknown@formal@none@1@S@In narrative, ambiguity can be introduced in several ways: motive, plot, character.@@@@1@12@@oe@19-8-2009 10020520@unknown@formal@none@1@S@[[F. Scott Fitzgerald]] uses the latter type of ambiguity with notable effect in his novel ''[[The Great Gatsby]]''.@@@@1@18@@oe@19-8-2009 10020530@unknown@formal@none@1@S@All [[religions]] debate the [[orthodoxy]] or [[heterodoxy]] of ambiguity.@@@@1@9@@oe@19-8-2009 10020540@unknown@formal@none@1@S@[[Christianity]] and [[Judaism]] employ the concept of [[paradox]] synonymously with 'ambiguity'.@@@@1@11@@oe@19-8-2009 10020550@unknown@formal@none@1@S@Ambiguity within Christianity (and other religions) is resisted by the conservatives and fundamentalists, who regard the concept as equating with 'contradiction'.@@@@1@21@@oe@19-8-2009 10020560@unknown@formal@none@1@S@Non-fundamentalist Christians and Jews endorse [[Rudolf Otto]]'s description of the sacred as 'mysterium tremendum et fascinans', the awe-inspiring mystery which fascinates humans.@@@@1@22@@oe@19-8-2009 10020570@unknown@formal@none@1@S@[[Metonymy]] involves the use of the name of a subcomponent part as an abbreviation, or [[jargon]], for the name of the whole object (for example "wheels" to refer to a car, or "flowers" to refer to beautiful offspring, an entire plant, or a collection of blooming plants).@@@@1@47@@oe@19-8-2009 10020580@unknown@formal@none@1@S@In modern [[vocabulary]] critical [[semiotics]], metonymy encompasses any potentially-ambiguous word substitution that is based on contextual [[contiguity]] (located close together), or a function or process that an object performs, such as "sweet ride" to refer to a nice car.@@@@1@39@@oe@19-8-2009 10020590@unknown@formal@none@1@S@Metonym miscommunication is considered a primary mechanism of linguistic humour.@@@@1@10@@oe@19-8-2009 10020600@unknown@formal@none@1@S@==Psychology and management==@@@@1@3@@oe@19-8-2009 10020610@unknown@formal@none@1@S@In sociology and social psychology, the term "ambiguity" is used to indicate situations that involve [[uncertainty]].@@@@1@16@@oe@19-8-2009 10020620@unknown@formal@none@1@S@An increasing amount of research is concentrating on how people react and respond to ambiguous situations.@@@@1@16@@oe@19-8-2009 10020630@unknown@formal@none@1@S@Much of this focuses on [[ambiguity tolerance]].@@@@1@7@@oe@19-8-2009 10020640@unknown@formal@none@1@S@A number of correlations have been found between an individual’s reaction and tolerance to ambiguity and a range of factors.@@@@1@20@@oe@19-8-2009 10020650@unknown@formal@none@1@S@Apter and Desselles (2001) for example, found a strong correlation with such attributes and factors like a greater preference for safe as opposed to risk based sports, a preference for endurance type activities as opposed to explosive activities, a more organized and less casual lifestyle, greater care and precision in descriptions, a lower sensitivity to emotional and unpleasant words, a less acute sense of humour, engaging a smaller variety of sexual practices than their more risk comfortable colleagues, a lower likelihood of the use of drugs, pornography and drink, a greater likelihood of displaying obsessional behaviour.@@@@1@96@@oe@19-8-2009 10020660@unknown@formal@none@1@S@In the field of [[leadership]] [[David Wilkinson (ambiguity expert)|David Wilkinson]] (2006) found strong correlations between an individual leaders reaction to ambiguous situations and the [[Modes of Leadership]] they use, the type of [[creativity]] (Kirton (2003) and how they relate to others.@@@@1@41@@oe@19-8-2009 10020670@unknown@formal@none@1@S@==Music==@@@@1@1@@oe@19-8-2009 10020680@unknown@formal@none@1@S@In [[music]], pieces or sections which confound expectations and may be or are interpreted simultaneously in different ways are ambiguous, such as some [[polytonality]], [[polymeter]], other ambiguous [[metre|meters]] or [[rhythm]]s, and ambiguous [[phrase (music)|phrasing]], or (Stein 2005, p.79) any [[aspect of music]].@@@@1@42@@oe@19-8-2009 10020690@unknown@formal@none@1@S@The [[music of Africa]] is often purposely ambiguous.@@@@1@8@@oe@19-8-2009 10020700@unknown@formal@none@1@S@To quote [[Donald Francis Tovey|Sir Donald Francis Tovey]] (1935, p.195), “Theorists are apt to vex themselves with vain efforts to remove uncertainty just where it has a high aesthetic value.”@@@@1@30@@oe@19-8-2009 10020710@unknown@formal@none@1@S@==Constructed language==@@@@1@2@@oe@19-8-2009 10020720@unknown@formal@none@1@S@Some [[Conlang|languages have been created]] with the intention of avoiding ambiguity, especially lexical ambiguity.@@@@1@14@@oe@19-8-2009 10020730@unknown@formal@none@1@S@[[Lojban]] and [[Loglan]] are two related languages which have been created with this in mind.@@@@1@15@@oe@19-8-2009 10020740@unknown@formal@none@1@S@The languages can be both spoken and written.@@@@1@8@@oe@19-8-2009 10020750@unknown@formal@none@1@S@These languages are intended to provide a greater technical precision over natural languages, although historically, such attempts at language improvement have been criticized.@@@@1@23@@oe@19-8-2009 10020760@unknown@formal@none@1@S@Languages composed from many diverse sources contain much ambiguity and inconsistency.@@@@1@11@@oe@19-8-2009 10020770@unknown@formal@none@1@S@The many exceptions to [[syntax]] and [[semantic]] rules are time-consuming and difficult to learn.@@@@1@14@@oe@19-8-2009 10020780@unknown@formal@none@1@S@==Mathematics and physics==@@@@1@3@@oe@19-8-2009 10020790@unknown@formal@none@1@S@[[Mathematical notation]], widely used in [[physics]] and other [[science]]s, avoids many ambiguities compared to expression in natural language.@@@@1@18@@oe@19-8-2009 10020800@unknown@formal@none@1@S@However, for various reasons, several [[Lexical (semiotics)|lexical]], [[syntactic]] and [[semantic]] ambiguities remain.@@@@1@12@@oe@19-8-2009 10020810@unknown@formal@none@1@S@===Names of functions===@@@@1@3@@oe@19-8-2009 10020820@unknown@formal@none@1@S@The ambiguity in the style of writing a function should not be confused with a [[multivalued function]], which can (and should) be defined in a deterministic and unambiguous way.@@@@1@29@@oe@19-8-2009 10020830@unknown@formal@none@1@S@Several [[special function]]s still do not have established notations.@@@@1@9@@oe@19-8-2009 10020840@unknown@formal@none@1@S@Usually, the conversion to another notation requires to scale the argument and/or the resulting value; sometimes, the same name of the function is used, causing confusions.@@@@1@26@@oe@19-8-2009 10020850@unknown@formal@none@1@S@Examples of such underestablished functions:@@@@1@5@@oe@19-8-2009 10020860@unknown@formal@none@1@S@* [[Sinc function]]@@@@1@3@@oe@19-8-2009 10020870@unknown@formal@none@1@S@* [[Elliptic integral#Complete_elliptic_integral_of_the_third_kind|Elliptic integral of the Third Kind]]; translating elliptic integral form [[MAPLE]] to [[Mathematica]], one should replace the second argument to its square, see [[Talk:Elliptic integral#List_of_notations]]; dealing with complex values, this may cause problems.@@@@1@35@@oe@19-8-2009 10020880@unknown@formal@none@1@S@* [[Exponential integral]], , page 228 http://www.math.sfu.ca/~cbm/aands/page_228.htm@@@@1@7@@oe@19-8-2009 10020890@unknown@formal@none@1@S@* [[Hermite polynomial]], , page 775 http://www.math.sfu.ca/~cbm/aands/page_775.htm@@@@1@7@@oe@19-8-2009 10020900@unknown@formal@none@1@S@===Expressions===@@@@1@1@@oe@19-8-2009 10020910@unknown@formal@none@1@S@Ambiguous espressions often appear in physical and mathematical texts.@@@@1@9@@oe@19-8-2009 10020920@unknown@formal@none@1@S@It is common practice to omit multiplication signs in mathematical expressions.@@@@1@11@@oe@19-8-2009 10020930@unknown@formal@none@1@S@Also, it is common, to give the same name to a variable and a function, for example, ~f=f(x)~.@@@@1@18@@oe@19-8-2009 10020940@unknown@formal@none@1@S@Then, if one sees ~g=f(y+1)~, there is no way to distinguish, does it mean ~f=f(x)~ '''multiplied''' by ~(y+1)~, or function ~f~ '''evaluated''' at argument equal to ~(y+1)~.@@@@1@27@@oe@19-8-2009 10020950@unknown@formal@none@1@S@In each case of use of such notations, the reader is supposed to be able to perform the deduction and reveal the true meaning.@@@@1@24@@oe@19-8-2009 10020960@unknown@formal@none@1@S@Creators of algorithmic languages try to avoid ambiguities.@@@@1@8@@oe@19-8-2009 10020970@unknown@formal@none@1@S@Many algorithmic languages ([[C++]], [[MATLAB]], [[Fortran]], [[Maple]]) require the character * as symbol of multiplication.@@@@1@15@@oe@19-8-2009 10020980@unknown@formal@none@1@S@The language [[Mathematica]] allows the user to omit the multiplication symbol, but requires square brackets to indicate the argument of a function; square brackets are not allowed for grouping of expressions.@@@@1@31@@oe@19-8-2009 10020990@unknown@formal@none@1@S@Fortran, in addition, does not allow use of the same name (identifier) for different objects, for example, function and variable; in particular, the expression '''f=f(x)''' is qualified as an error.@@@@1@30@@oe@19-8-2009 10021000@unknown@formal@none@1@S@The order of operations may depend on the context.@@@@1@9@@oe@19-8-2009 10021010@unknown@formal@none@1@S@In most [[programming language]]s, the operations of division and multiplication have equal priority and are executed from left to right.@@@@1@20@@oe@19-8-2009 10021020@unknown@formal@none@1@S@Until the last century, many editorials assumed that multiplication is performed first, for example, ~a/bc~ is interpreted as ~a/(bc)~; in this case, the insertion of parentheses is required when translating the formulas to an algorithmic language.@@@@1@36@@oe@19-8-2009 10021030@unknown@formal@none@1@S@In addition, it is common to write an argument of a function without parenthesis, which also may lead to ambiguity.@@@@1@20@@oe@19-8-2009 10021040@unknown@formal@none@1@S@Sometimes, one uses ''italics'' letters to denote elementary functions.@@@@1@9@@oe@19-8-2009 10021050@unknown@formal@none@1@S@In the [[scientific journal]] style, the expression ~ s i n \\alpha~ means product of variables ~s~, ~i~, ~n~ and ~\\alpha~, although in a slideshow, it may mean ~\\sin[\\alpha]~.@@@@1@29@@oe@19-8-2009 10021060@unknown@formal@none@1@S@Comma in subscripts and superscripts sometimes is omitted; it is also ambiguous notation.@@@@1@13@@oe@19-8-2009 10021070@unknown@formal@none@1@S@If it is written ~T_{mnk}~, the reader should guess from the context, does it mean a single-index object, evaluated while the subscript is equal to product of variables ~m~, ~n~ and ~k~, or it is indication to a three-valent tensor.@@@@1@40@@oe@19-8-2009 10021080@unknown@formal@none@1@S@The writing of ~T_{mnk}~ instead of ~T_{m,n,k}~ may mean that the writer either is stretched in space (for example, to reduce the publication fees, or aims to increase number of publications without considering readers.@@@@1@34@@oe@19-8-2009 10021090@unknown@formal@none@1@S@The same may apply to any other use of ambiguous notations.@@@@1@11@@oe@19-8-2009 10021100@unknown@formal@none@1@S@===Examples of potentially confusing ambiguous mathematical expressions ===@@@@1@8@@oe@19-8-2009 10021110@unknown@formal@none@1@S@\\sin^2\\alpha/2\\,, which could be understood to mean either (\\sin(\\alpha/2))^2\\, or (\\sin(\\alpha))^2/2\\,.@@@@1@11@@oe@19-8-2009 10021120@unknown@formal@none@1@S@~\\sin^{-1} \\alpha, which by convention means ~\\arcsin(\\alpha) ~, though it might be thought to mean (\\sin(\\alpha))^{-1}\\, since ~\\sin^{n} \\alpha means (\\sin(\\alpha))^{n}\\,.@@@@1@21@@oe@19-8-2009 10021130@unknown@formal@none@1@S@a/2b\\,, which arguably should mean (a/2)b\\, but would commonly be understood to mean a/(2b)\\,@@@@1@14@@oe@19-8-2009 10021140@unknown@formal@none@1@S@===Notations in [[quantum optics]] and [[quantum mechanics]]===@@@@1@7@@oe@19-8-2009 10021150@unknown@formal@none@1@S@It is common to define the [[coherent states]] in [[quantum optics]] with ~|\\alpha\\rangle~ and states with fixed number of photons with ~|n\\rangle~.@@@@1@23@@oe@19-8-2009 10021160@unknown@formal@none@1@S@Then, there is an "unwritten rule": the state is coherent if there are more Greek characters than Latin characters in the argument, and ~n~photon state if the Latin characters dominate.@@@@1@30@@oe@19-8-2009 10021170@unknown@formal@none@1@S@The ambiguity becomes even worse, if ~|x\\rangle~ is used for the states with certain value of the coordinate, and ~|p\\rangle~ means the state with certain value of the momentum, which may be used in books on [[quantum mechanics]].@@@@1@38@@oe@19-8-2009 10021180@unknown@formal@none@1@S@Such ambiguities easy lead to confusions, especially if some normalized [[adimensional]], [[dimensionless]] variables are used.@@@@1@15@@oe@19-8-2009 10021190@unknown@formal@none@1@S@Expression |1\\rangle may mean a state with single photon, or the coherent state with mean amplitude equal to 1, or state with momentum equal to unity, and so on.@@@@1@31@@oe@19-8-2009 10021200@unknown@formal@none@1@S@The reader is supposed to guess from the context.@@@@1@9@@oe@19-8-2009 10021210@unknown@formal@none@1@S@===Examples of ambiguous terms in physics===@@@@1@6@@oe@19-8-2009 10021220@unknown@formal@none@1@S@Some physical quantities do not yet have established notations; their value (and sometimes even [[dimension]], as in the case of the [[Einstein coefficients]]) depends on the system of notations.@@@@1@29@@oe@19-8-2009 10021230@unknown@formal@none@1@S@A highly confusing term is [[gain]].@@@@1@6@@oe@19-8-2009 10021240@unknown@formal@none@1@S@For example, the sentence "the gain of a system should be doubled", without context, means close to nothing.@@@@1@18@@oe@19-8-2009 10021250@unknown@formal@none@1@S@It may mean that the ratio of the output voltage of an electric circuit to the input voltage should be doubled.@@@@1@21@@oe@19-8-2009 10021260@unknown@formal@none@1@S@It may mean that the ratio of the output power of an electric or optical circuit to the input power should be doubled.@@@@1@23@@oe@19-8-2009 10021270@unknown@formal@none@1@S@It may mean that the gain of the laser medium should be doubled, for example, doubling the population of the upper laser level in a quasi-two level system (assuming negligible absorption of the ground-state).@@@@1@34@@oe@19-8-2009 10021280@unknown@formal@none@1@S@Also, confusions may be related with the use of [[atomic percent]] as measure of concentration of a [[dopant]], or [[Optical resolution|resolution]] of an [[imaging system]], as measure of the size of the smallest detail which still can be resolved at the background of statistical noise.@@@@1@45@@oe@19-8-2009 10021290@unknown@formal@none@1@S@See also [[Accuracy and precision]] and its talk.@@@@1@8@@oe@19-8-2009 10021300@unknown@formal@none@1@S@Many terms are ambiguous.@@@@1@4@@oe@19-8-2009 10021310@unknown@formal@none@1@S@Each use of an ambiguous term should be preceded by the definition, suitable for a specific case.@@@@1@17@@oe@19-8-2009 10021320@unknown@formal@none@1@S@The [[Berry paradox]] arises as a result of systematic ambiguity.@@@@1@10@@oe@19-8-2009 10021330@unknown@formal@none@1@S@In various formulations of the Berry paradox, such as one that reads: ''The number not nameable in less than eleven syllables'' the term ''nameable'' is one that has this systematic ambiguity.@@@@1@31@@oe@19-8-2009 10021340@unknown@formal@none@1@S@Terms of this kind give rise to [[vicious circle]] fallacies.@@@@1@10@@oe@19-8-2009 10021350@unknown@formal@none@1@S@Other terms with this type of ambiguity are: satisfiable, definable, true, false, function, property, class, relation, cardinal, and ordinal.@@@@1@19@@oe@19-8-2009 10021360@unknown@formal@none@1@S@==Pedagogic use of ambiguous expressions==@@@@1@5@@oe@19-8-2009 10021370@unknown@formal@none@1@S@Ambiguity can be used as a pedagogical trick, to force students to reproduce the deduction by themselves.@@@@1@17@@oe@19-8-2009 10021380@unknown@formal@none@1@S@Some textbooks give the same name to the function and to its [[Fourier transform]]:@@@@1@14@@oe@19-8-2009 10021390@unknown@formal@none@1@S@:~f(\\omega)=\\int f(t) \\exp(i\\omega t) {\\rm d}t .@@@@1@7@@oe@19-8-2009 10021400@unknown@formal@none@1@S@Rigorously speaking, such an expression requires that ~ f=0 ~; even if function ~ f ~ is a [[self-Fourier function]], the expression should be written as ~f(\\omega)=\\frac{1}{\\sqrt{2\\pi}}\\int f(t) \\exp(i\\omega t) {\\rm d}t ; however, it is assumed that the shape of the function (and even its norm \\int |f(x)|^2 {\\rm d}x ) depend on the character used to denote its argument.@@@@1@62@@oe@19-8-2009 10021410@unknown@formal@none@1@S@If the Greek letter is used, it is assumed to be a [[Fourier transform]] of another function, The first function is assumed, if the expression in the argument contains more characters ~t~ or ~\\tau~, than characters ~\\omega~, and the second function is assumed in the opposite case.@@@@1@47@@oe@19-8-2009 10021420@unknown@formal@none@1@S@Expressions like ~f(\\omega t)~ or ~f(y)~ contain symbols ~t~ and ~\\omega~ in equal amounts; they are ambiguous and should be avoided in serious deduction.@@@@1@24@@oe@19-8-2009 10030010@unknown@formal@none@1@S@
Artificial intelligence
@@@@1@2@@oe@19-8-2009 10030020@unknown@formal@none@1@S@'''Artificial intelligence (AI)''' is both the [[intelligence]] of machines and the branch of [[computer science]] which aims to create it.@@@@1@20@@oe@19-8-2009 10030030@unknown@formal@none@1@S@Major AI textbooks define artificial intelligence as "the study and design of [[intelligent agents]]," where an [[intelligent agent]] is a system that perceives its environment and takes actions which maximize its chances of success.@@@@1@34@@oe@19-8-2009 10030040@unknown@formal@none@1@S@[[John McCarthy (computer scientist)|John McCarthy]], who coined the term in 1956, defines it as "the science and engineering of making intelligent machines."@@@@1@22@@oe@19-8-2009 10030050@unknown@formal@none@1@S@Among the traits that researchers hope machines will exhibit are [[:#Deduction, reasoning, problem solving|reasoning]], [[#Knowledge representation|knowledge]], [[#Planning|planning]], [[#Learning|learning]], [[#Natural language processing|communication]], [[#Perception|perception]] and the ability to [[#Motion and manipulation|move]] and manipulate objects.@@@@1@32@@oe@19-8-2009 10030055@unknown@formal@none@1@S@[[#General intelligence|General intelligence]] (or "[[strong AI]]") has not yet been achieved and is a long-term goal of some AI research.@@@@1@20@@oe@19-8-2009 10030060@unknown@formal@none@1@S@AI research uses tools and insights from many fields, including [[computer science]], [[psychology]], [[philosophy]], [[neuroscience]], [[cognitive science]], [[computational linguistics|linguistics]], [[ontology (information science)|ontology]], [[operations research]], [[computational economics|economics]], [[control theory]], [[probability]], [[optimization (mathematics)|optimization]] and [[logic]].@@@@1@33@@oe@19-8-2009 10030070@unknown@formal@none@1@S@AI research also overlaps with tasks such as [[robotics]], [[control system]]s, [[automated planning and scheduling|scheduling]], [[data mining]], [[logistics]], [[speech recognition]], [[facial recognition system|facial recognition]] and many others.@@@@1@27@@oe@19-8-2009 10030080@unknown@formal@none@1@S@Other names for the field have been proposed, such as [[computational intelligence]], [[synthetic intelligence]], [[intelligent systems]], or computational rationality.@@@@1@19@@oe@19-8-2009 10030090@unknown@formal@none@1@S@== Perspectives on AI ==@@@@1@5@@oe@19-8-2009 10030100@unknown@formal@none@1@S@=== AI in myth, fiction and speculation ===@@@@1@8@@oe@19-8-2009 10030110@unknown@formal@none@1@S@Humanity has imagined in great detail the implications of thinking machines or artificial beings.@@@@1@14@@oe@19-8-2009 10030120@unknown@formal@none@1@S@They appear in [[Greek myth]]s, such as [[Talos]] of [[Crete]], the golden robots of [[Hephaestus]] and [[Pygmalion (mythology)|Pygmalion's]] [[Galatea (mythology)|Galatea]].@@@@1@20@@oe@19-8-2009 10030130@unknown@formal@none@1@S@The earliest known humanoid robots (or [[automaton]]s) were [[cult image|sacred statue]]s worshipped in [[Egypt]] and [[Greece]], believed to have been endowed with genuine consciousness by craftsman.@@@@1@26@@oe@19-8-2009 10030140@unknown@formal@none@1@S@In the sixteenth century, the [[alchemist]] [[Paracelsus]] claimed to have created artificial beings.@@@@1@13@@oe@19-8-2009 10030150@unknown@formal@none@1@S@Realistic clockwork imitations of human beings have been built by people such as [[King Mu of Zhou#Robotics|Yan Shi]], [[Hero of Alexandria]], [[Al-Jazari]] and [[Wolfgang von Kempelen]].@@@@1@26@@oe@19-8-2009 10030160@unknown@formal@none@1@S@In modern fiction, beginning with [[Mary Shelley]]'s classic ''[[Frankenstein]],'' writers have explored the [[ethics of artificial intelligence|ethical]] issues presented by thinking machines.@@@@1@22@@oe@19-8-2009 10030170@unknown@formal@none@1@S@If a machine can be created that has intelligence, can it also ''feel''?@@@@1@13@@oe@19-8-2009 10030180@unknown@formal@none@1@S@If it can feel, does it have the same rights as a human being?@@@@1@14@@oe@19-8-2009 10030190@unknown@formal@none@1@S@This is a key issue in ''[[Frankenstein]]'' as well as in modern science fiction: for example, the film ''[[Artificial Intelligence: A.I.]]'' considers a machine in the form of a small boy which has been given the ability to feel human emotions, including, tragically, the capacity to suffer.@@@@1@47@@oe@19-8-2009 10030200@unknown@formal@none@1@S@This issue is also being considered by [[futurist]]s, such as California's [[Institute for the Future]] under the name "[[robot rights]]", although many critics believe that the discussion is premature.@@@@1@29@@oe@19-8-2009 10030210@unknown@formal@none@1@S@[[Science fiction]] writers and [[futurist]]s have also speculated on the technology's potential impact on humanity.@@@@1@15@@oe@19-8-2009 10030220@unknown@formal@none@1@S@In fiction, AI has appeared as a servant ([[R2D2]] in ''[[Star Wars]]''), a comrade ([[Data (Star Trek)|Lt. Commander Data]] in ''[[Star Trek]]''), an extension to human abilities (''[[Ghost in the Shell]]''), a conqueror (''[[The Matrix]]''), a dictator (''[[With Folded Hands]]'') and an exterminator (''[[Terminator (series)|Terminator]]'', ''[[Battlestar Galactica (re-imagining)|Battlestar Galactica]]'').@@@@1@49@@oe@19-8-2009 10030230@unknown@formal@none@1@S@Some realistic potential consequences of AI are decreased human labor demand, the enhancement of human ability or experience, and a need for redefinition of human identity and basic values.@@@@1@29@@oe@19-8-2009 10030240@unknown@formal@none@1@S@[[Futurist]]s estimate the capabilities of machines using [[Moore's Law]], which measures the relentless exponential improvement in digital technology with uncanny accuracy.@@@@1@21@@oe@19-8-2009 10030250@unknown@formal@none@1@S@[[Ray Kurzweil]] has calculated that [[desktop computer]]s will have the same processing power as human brains by the year 2029, and that by 2045 artificial intelligence will reach a point where it is able to improve ''itself'' at a rate that far exceeds anything conceivable in the past, a scenario that [[science fiction]] writer [[Vernor Vinge]] named the "[[technological singularity]]".@@@@1@60@@oe@19-8-2009 10030260@unknown@formal@none@1@S@"Artificial intelligence is the next stage in evolution," [[Edward Fredkin]] said in the 1980s, expressing an idea first proposed by [[Samuel Butler (novelist)|Samuel Butler]]'s ''[[Darwin Among the Machines]]'' (1863), and expanded upon by [[George Dyson (science historian)|George Dyson]] in his book of the same name (1998).@@@@1@46@@oe@19-8-2009 10030270@unknown@formal@none@1@S@Several [[futurist]]s and [[science fiction]] writers have predicted that human beings and machines will merge in the future into [[cyborg]]s that are more capable and powerful than either.@@@@1@28@@oe@19-8-2009 10030280@unknown@formal@none@1@S@This idea, called [[transhumanism]], has roots in [[Aldous Huxley]] and [[Robert Ettinger]], is now associated with [[robotics|robot]] designer [[Hans Moravec]], [[cybernetics|cyberneticist]] [[Kevin Warwick]] and [[Ray Kurzweil]].@@@@1@26@@oe@19-8-2009 10030290@unknown@formal@none@1@S@[[Transhumanism]] has been illustrated in fiction as well, for example on the [[manga]] ''[[Ghost in the Shell]]''@@@@1@17@@oe@19-8-2009 10030300@unknown@formal@none@1@S@=== History of AI research ===@@@@1@6@@oe@19-8-2009 10030310@unknown@formal@none@1@S@In the middle of the 20th century, a handful of scientists began a new approach to building intelligent machines, based on recent discoveries in [[neurology]], a new mathematical theory of [[information]], an understanding of control and stability called [[cybernetic]]s, and above all, by the invention of the [[digital computer]], a machine based on the abstract essence of mathematical reasoning.@@@@1@59@@oe@19-8-2009 10030320@unknown@formal@none@1@S@The field of modern AI research was founded at conference on the campus of [[Dartmouth College]] in the summer of 1956.@@@@1@21@@oe@19-8-2009 10030330@unknown@formal@none@1@S@Those who attended would become the leaders of AI research for many decades, especially [[John McCarthy (computer scientist)|John McCarthy]], [[Marvin Minsky]], [[Allen Newell]] and [[Herbert Simon]], who founded AI laboratories at [[MIT]], [[Carnegie Mellon University|CMU]] and [[Stanford]].@@@@1@37@@oe@19-8-2009 10030340@unknown@formal@none@1@S@They and their students wrote programs that were, to most people, simply astonishing: computers were solving word problems in algebra, proving logical theorems and speaking English.@@@@1@26@@oe@19-8-2009 10030350@unknown@formal@none@1@S@By the middle 60s their research was heavily funded by the [[DARPA|U.S. Department of Defense]] and they were optimistic about the future of the new field:@@@@1@26@@oe@19-8-2009 10030360@unknown@formal@none@1@S@* 1965, [[H. A. Simon]]: "[M]achines will be capable, within twenty years, of doing any work a man can do"@@@@1@20@@oe@19-8-2009 10030370@unknown@formal@none@1@S@* 1967, [[Marvin Minsky]]: "Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved."@@@@1@18@@oe@19-8-2009 10030380@unknown@formal@none@1@S@These predictions, and many like them, would not come true.@@@@1@10@@oe@19-8-2009 10030390@unknown@formal@none@1@S@They had failed to recognize the difficulty of some of the problems they faced.@@@@1@14@@oe@19-8-2009 10030400@unknown@formal@none@1@S@In 1974, in response to the criticism of England's [[Sir James Lighthill]] and ongoing pressure from Congress to fund more productive projects, the U.S. and British governments cut off all undirected, exploratory research in AI.@@@@1@35@@oe@19-8-2009 10030410@unknown@formal@none@1@S@This was the first [[AI Winter]].@@@@1@6@@oe@19-8-2009 10030420@unknown@formal@none@1@S@In the early 80s, AI research was revived by the commercial success of [[expert systems]] (a form of AI program that simulated the knowledge and analytical skills of one or more human experts) and by 1985 the market for AI had reached more than a billion dollars.@@@@1@47@@oe@19-8-2009 10030430@unknown@formal@none@1@S@[[Marvin Minsky|Minsky]] and others warned the community that enthusiasm for AI had spiraled out of control and that disappointment was sure to follow.@@@@1@23@@oe@19-8-2009 10030440@unknown@formal@none@1@S@Beginning with the collapse of the [[Lisp Machine]] market in 1987, AI once again fell into disrepute, and a second, more lasting [[AI Winter]] began.@@@@1@25@@oe@19-8-2009 10030450@unknown@formal@none@1@S@In the 90s and early 21st century AI achieved its greatest successes, albeit somewhat behind the scenes.@@@@1@17@@oe@19-8-2009 10030460@unknown@formal@none@1@S@Artificial intelligence was adopted throughout the technology industry, providing the heavy lifting for [[logistics]], [[data mining]], [[medical diagnosis]] and many other areas.@@@@1@22@@oe@19-8-2009 10030470@unknown@formal@none@1@S@The success was due to several factors: the incredible power of computers today (see [[Moore's law]]), a greater emphasis on solving specific subproblems, the creation of new ties between AI and other fields working on similar problems, and above all a new commitment by researchers to solid mathematical methods and rigorous scientific standards.@@@@1@53@@oe@19-8-2009 10030480@unknown@formal@none@1@S@=== Philosophy of AI ===@@@@1@5@@oe@19-8-2009 10030490@unknown@formal@none@1@S@In a [[Computing Machinery and Intelligence|classic 1950 paper]], [[Alan Turing]] posed the question "Can Machines Think?"@@@@1@16@@oe@19-8-2009 10030500@unknown@formal@none@1@S@In the years since, the [[philosophy of artificial intelligence]] has attempted to answer it.@@@@1@14@@oe@19-8-2009 10030510@unknown@formal@none@1@S@* [[Turing Test|Turing's "polite convention"]]: ''If a machine acts as intelligently as a human being, then it is as intelligent as a human being.''@@@@1@24@@oe@19-8-2009 10030520@unknown@formal@none@1@S@[[Alan Turing]] theorized that, ultimately, we can only judge the intelligence of machine based on its behavior.@@@@1@17@@oe@19-8-2009 10030530@unknown@formal@none@1@S@This theory forms the basis of the [[Turing test]].@@@@1@9@@oe@19-8-2009 10030540@unknown@formal@none@1@S@* The [[Dartmouth Conferences|Dartmouth proposal]]: ''Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it.''@@@@1@29@@oe@19-8-2009 10030550@unknown@formal@none@1@S@This assertion was printed in the proposal for the [[Dartmouth Conferences|Dartmouth Conference]] of 1956, and represents the position of most working AI researchers.@@@@1@23@@oe@19-8-2009 10030560@unknown@formal@none@1@S@* [[Alan Newell|Newell]] and [[Herbert Simon|Simon]]'s physical symbol system hypothesis: ''A [[physical symbol system]] has the necessary and sufficient means of general intelligent action.''@@@@1@24@@oe@19-8-2009 10030570@unknown@formal@none@1@S@This statement claims that the essence of intelligence is symbol manipulation.@@@@1@11@@oe@19-8-2009 10030580@unknown@formal@none@1@S@[[Hubert Dreyfus]] argued that, on the contrary, human expertise depends on unconscious instinct rather than conscious symbol manipulation and on having a "feel" for the situation rather than explicit symbolic knowledge.@@@@1@31@@oe@19-8-2009 10030590@unknown@formal@none@1@S@* [[Gödel's incompleteness theorem]]: ''A [[physical symbol system]] can not prove all true statements.''@@@@1@14@@oe@19-8-2009 10030600@unknown@formal@none@1@S@[[Roger Penrose]] is among those who claim that Gödel's theorem limits what machines can do.@@@@1@15@@oe@19-8-2009 10030610@unknown@formal@none@1@S@* [[John Searle|Searle]]'s "strong AI position": ''A [[physical symbol system]] can have a [[mind]] and [[consciousness|mental states]].''@@@@1@17@@oe@19-8-2009 10030620@unknown@formal@none@1@S@Searle counters this assertion with his [[Chinese room]] argument, which asks us to look ''inside'' the computer and try to find where the "mind" might be.@@@@1@26@@oe@19-8-2009 10030630@unknown@formal@none@1@S@* The [[artificial brain]] argument: ''The brain can be simulated.''@@@@1@10@@oe@19-8-2009 10030640@unknown@formal@none@1@S@[[Hans Moravec]], [[Ray Kurzweil]] and others have argued that it is technologically feasible to copy the brain directly into hardware and software, and that such a simulation will be essentially identical to the original.@@@@1@34@@oe@19-8-2009 10030650@unknown@formal@none@1@S@This argument combines the idea that a [[Turing complete|suitably powerful]] machine can simulate any process, with the [[materialist]] idea that the [[mind]] is the result of a physical process in the [[brain]].@@@@1@32@@oe@19-8-2009 10030660@unknown@formal@none@1@S@== AI research ==@@@@1@4@@oe@19-8-2009 10030670@unknown@formal@none@1@S@=== Problems of AI ===@@@@1@5@@oe@19-8-2009 10030680@unknown@formal@none@1@S@While there is no universally accepted definition of intelligence, AI researchers have studied several traits that are considered essential.@@@@1@19@@oe@19-8-2009 10030690@unknown@formal@none@1@S@====Deduction, reasoning, problem solving ====@@@@1@5@@oe@19-8-2009 10030700@unknown@formal@none@1@S@Early AI researchers developed algorithms that imitated the process of conscious, step-by-step reasoning that human beings use when they solve puzzles, play board games, or make logical deductions.@@@@1@28@@oe@19-8-2009 10030710@unknown@formal@none@1@S@By the late 80s and 90s, AI research had also developed highly successful methods for dealing with [[uncertainty|uncertain]] or incomplete information, employing concepts from [[probability]] and [[economics]].@@@@1@27@@oe@19-8-2009 10030720@unknown@formal@none@1@S@For difficult problems, most of these algorithms can require enormous computational resources — most experience a "[[combinatorial explosion]]": the amount of memory or computer time required becomes astronomical when the problem goes beyond a certain size.@@@@1@36@@oe@19-8-2009 10030730@unknown@formal@none@1@S@The search for more efficient problem solving algorithms is a high priority for AI research.@@@@1@15@@oe@19-8-2009 10030740@unknown@formal@none@1@S@It is not clear, however, that conscious human reasoning is any more efficient when faced with a difficult abstract problem.@@@@1@20@@oe@19-8-2009 10030750@unknown@formal@none@1@S@[[Cognitive science|Cognitive scientists]] have demonstrated that human beings solve most of their problems using [[unconscious]] reasoning, rather than the conscious, step-by-step deduction that early AI research was able to model.@@@@1@30@@oe@19-8-2009 10030760@unknown@formal@none@1@S@[[Embodied cognitive science]] argues that unconscious [[sensorimotor]] skills are essential to our problem solving abilities.@@@@1@15@@oe@19-8-2009 10030770@unknown@formal@none@1@S@It is hoped that sub-symbolic methods, like [[computational intelligence]] and [[situated]] AI, will be able to model these instinctive skills.@@@@1@20@@oe@19-8-2009 10030780@unknown@formal@none@1@S@The problem of unconscious problem solving, which forms part of our [[commonsense reasoning]], is largely unsolved.@@@@1@16@@oe@19-8-2009 10030790@unknown@formal@none@1@S@====Knowledge representation====@@@@1@2@@oe@19-8-2009 10030800@unknown@formal@none@1@S@[[Knowledge representation]] and [[knowledge engineering]] are central to AI research.@@@@1@10@@oe@19-8-2009 10030810@unknown@formal@none@1@S@Many of the problems machines are expected to solve will require extensive knowledge about the world.@@@@1@16@@oe@19-8-2009 10030820@unknown@formal@none@1@S@Among the things that AI needs to represent are: objects, properties, categories and relations between objects; situations, events, states and time; causes and effects; knowledge about knowledge (what we know about what other people know); and many other, less well researched domains.@@@@1@42@@oe@19-8-2009 10030830@unknown@formal@none@1@S@A complete representation of "what exists" is an [[ontology (computer science)|ontology]] (borrowing a word from traditional [[philosophy]]), of which the most general are called [[upper ontology|upper ontologies]].@@@@1@27@@oe@19-8-2009 10030840@unknown@formal@none@1@S@Among the most difficult problems in knowledge representation are:@@@@1@9@@oe@19-8-2009 10030850@unknown@formal@none@1@S@* ''Default reasoning and the [[qualification problem]]'': Many of the things people know take the form of "working assumptions."@@@@1@19@@oe@19-8-2009 10030860@unknown@formal@none@1@S@For example, if a bird comes up in conversation, people typically picture an animal that is fist sized, sings, and flies.@@@@1@21@@oe@19-8-2009 10030870@unknown@formal@none@1@S@None of these things are true about birds in general.@@@@1@10@@oe@19-8-2009 10030880@unknown@formal@none@1@S@[[John McCarthy (computer scientist)|John McCarthy]] identified this problem in 1969 as the qualification problem: for any commonsense rule that AI researchers care to represent, there tend to be a huge number of exceptions.@@@@1@33@@oe@19-8-2009 10030890@unknown@formal@none@1@S@Almost nothing is simply true or false in the way that abstract logic requires.@@@@1@14@@oe@19-8-2009 10030900@unknown@formal@none@1@S@AI research has explored a number of solutions to this problem.@@@@1@11@@oe@19-8-2009 10030910@unknown@formal@none@1@S@* ''Unconscious knowledge'': Much of what people know isn't represented as "facts" or "statements" that they could actually say out loud.@@@@1@21@@oe@19-8-2009 10030920@unknown@formal@none@1@S@They take the form of intuitions or tendencies and are represented in the brain unconsciously and sub-symbolically.@@@@1@17@@oe@19-8-2009 10030930@unknown@formal@none@1@S@This unconscious knowledge informs, supports and provides a context for our conscious knowledge.@@@@1@13@@oe@19-8-2009 10030940@unknown@formal@none@1@S@As with the related problem of unconscious reasoning, it is hoped that [[situated]] AI or [[computational intelligence]] will provide ways to represent this kind of knowledge.@@@@1@26@@oe@19-8-2009 10030950@unknown@formal@none@1@S@* ''The breadth of [[common sense knowledge]]'': The number of atomic facts that the average person knows is astronomical.@@@@1@19@@oe@19-8-2009 10030960@unknown@formal@none@1@S@Research projects that attempt to build a complete knowledge base of [[commonsense knowledge]], such as [[Cyc]], require enormous amounts of tedious step-by-step ontological engineering — they must be built, by hand, one complicated concept at a time.@@@@1@37@@oe@19-8-2009 10030970@unknown@formal@none@1@S@====Planning====@@@@1@1@@oe@19-8-2009 10030980@unknown@formal@none@1@S@Intelligent agents must be able to set goals and achieve them.@@@@1@11@@oe@19-8-2009 10030990@unknown@formal@none@1@S@They need a way to visualize the future: they must have a representation of the state of the world and be able to make predictions about how their actions will change it.@@@@1@32@@oe@19-8-2009 10031000@unknown@formal@none@1@S@They must also attempt to determine the [[utility]] or "value" of the choices available to it.@@@@1@16@@oe@19-8-2009 10031010@unknown@formal@none@1@S@In some planning problems, the agent can assume that it is the only thing acting on the world and it can be certain what the consequences of its actions may be.@@@@1@31@@oe@19-8-2009 10031020@unknown@formal@none@1@S@However, if this is not true, it must periodically check if the world matches its predictions and it must change its plan as this becomes necessary, requiring the agent to reason under uncertainty.@@@@1@33@@oe@19-8-2009 10031030@unknown@formal@none@1@S@[[Multi-agent planning]] tries to determine the best plan for a community of [[agent]]s, using [[cooperation]] and [[competition]] to achieve a given goal.@@@@1@22@@oe@19-8-2009 10031040@unknown@formal@none@1@S@[[Emergent behavior]] such as this is used by both [[evolutionary algorithm]]s and [[swarm intelligence]].@@@@1@14@@oe@19-8-2009 10031050@unknown@formal@none@1@S@====Learning====@@@@1@1@@oe@19-8-2009 10031060@unknown@formal@none@1@S@Important [[machine learning]] problems are:@@@@1@5@@oe@19-8-2009 10031070@unknown@formal@none@1@S@* [[Unsupervised learning]]: find a model that matches a stream of input "experiences", and be able to predict what new "experiences" to expect.@@@@1@23@@oe@19-8-2009 10031080@unknown@formal@none@1@S@* [[Supervised learning]], such as [[statistical classification|classification]] (be able to determine what category something belongs in, after seeing a number of examples of things from each category), or [[regression]] (given a set of numerical input/output examples, discover a continuous function that would generate the outputs from the inputs).@@@@1@48@@oe@19-8-2009 10031090@unknown@formal@none@1@S@* [[Reinforcement learning]]: the agent is rewarded for good responses and punished for bad ones.@@@@1@15@@oe@19-8-2009 10031100@unknown@formal@none@1@S@(These can be analyzed in terms [[decision theory]], using concepts like [[utility (economics)|utility]]).@@@@1@13@@oe@19-8-2009 10031110@unknown@formal@none@1@S@====Natural language processing====@@@@1@3@@oe@19-8-2009 10031120@unknown@formal@none@1@S@[[Natural language processing]] gives machines the ability to read and understand the languages human beings speak.@@@@1@16@@oe@19-8-2009 10031130@unknown@formal@none@1@S@Many researchers hope that a sufficiently powerful natural language processing system would be able to acquire knowledge on its own, by reading the existing text available over the internet.@@@@1@29@@oe@19-8-2009 10031140@unknown@formal@none@1@S@Some straightforward applications of natural language processing include [[information retrieval]] (or [[text mining]]) and [[machine translation]].@@@@1@16@@oe@19-8-2009 10031150@unknown@formal@none@1@S@====Motion and manipulation====@@@@1@3@@oe@19-8-2009 10031160@unknown@formal@none@1@S@The field of [[robotics]] is closely related to AI.@@@@1@9@@oe@19-8-2009 10031170@unknown@formal@none@1@S@Intelligence is required for robots to be able to handle such tasks as object manipulation and [[motion planning|navigation]], with sub-problems of [[localization]] (knowing where you are), [[robotic mapping|mapping]] (learning what is around you) and [[motion planning]] (figuring out how to get there).@@@@1@42@@oe@19-8-2009 10031180@unknown@formal@none@1@S@====Perception====@@@@1@1@@oe@19-8-2009 10031190@unknown@formal@none@1@S@[[Machine perception]] is the ability to use input from sensors (such as cameras, microphones, sonar and others more exotic) to deduce aspects of the world.@@@@1@25@@oe@19-8-2009 10031200@unknown@formal@none@1@S@[[Computer vision]] is the ability to analyze visual input.@@@@1@9@@oe@19-8-2009 10031210@unknown@formal@none@1@S@A few selected subproblems are [[speech recognition]], [[facial recognition]] and [[object recognition]].@@@@1@12@@oe@19-8-2009 10031220@unknown@formal@none@1@S@====Social intelligence====@@@@1@2@@oe@19-8-2009 10031230@unknown@formal@none@1@S@Emotion and social skills play two roles for an intelligent agent:@@@@1@11@@oe@19-8-2009 10031240@unknown@formal@none@1@S@* It must be able to predict the actions of others, by understanding their motives and emotional states.@@@@1@18@@oe@19-8-2009 10031250@unknown@formal@none@1@S@(This involves elements of [[game theory]], [[decision theory]], as well as the ability to model human emotions and the perceptual skills to detect emotions.)@@@@1@24@@oe@19-8-2009 10031260@unknown@formal@none@1@S@* For good [[human-computer interaction]], an intelligent machine also needs to ''display'' emotions — at the very least it must appear polite and sensitive to the humans it interacts with.@@@@1@30@@oe@19-8-2009 10031270@unknown@formal@none@1@S@At best, it should appear to have normal emotions itself.@@@@1@10@@oe@19-8-2009 10031280@unknown@formal@none@1@S@====Creativity====@@@@1@1@@oe@19-8-2009 10031290@unknown@formal@none@1@S@A sub-field of AI addresses [[creativity]] both theoretically (from a philosophical and psychological perspective) and practically (via specific implementations of systems that generate outputs that can be considered creative).@@@@1@29@@oe@19-8-2009 10031300@unknown@formal@none@1@S@====General intelligence====@@@@1@2@@oe@19-8-2009 10031310@unknown@formal@none@1@S@Most researchers hope that their work will eventually be incorporated into a machine with ''general'' intelligence (known as [[strong AI]]), combining all the skills above and exceeding human abilities at most or all of them.@@@@1@35@@oe@19-8-2009 10031320@unknown@formal@none@1@S@A few believe that [[anthropomorphic]] features like [[artificial consciousness]] or an [[artificial brain]] may be required for such a project.@@@@1@20@@oe@19-8-2009 10031330@unknown@formal@none@1@S@Many of the problems above are considered [[AI-complete]]: to solve one problem, you must solve them all.@@@@1@17@@oe@19-8-2009 10031340@unknown@formal@none@1@S@For example, even a straightforward, specific task like [[machine translation]] requires that the machine follow the author's argument ([[#Deduction, reasoning, problem solving|reason]]), know what it's talking about ([[#Knowledge representation|knowledge]]), and faithfully reproduce the author's intention ([[#Social intelligence|social intelligence]]).@@@@1@38@@oe@19-8-2009 10031350@unknown@formal@none@1@S@[[Machine translation]], therefore, is believed to be AI-complete: it may require [[strong AI]] to be done as well as humans can do it.@@@@1@23@@oe@19-8-2009 10031360@unknown@formal@none@1@S@=== Approaches to AI ===@@@@1@5@@oe@19-8-2009 10031370@unknown@formal@none@1@S@There are as many approaches to AI as there are AI researchers—any coarse categorization is likely to be unfair to someone.@@@@1@21@@oe@19-8-2009 10031380@unknown@formal@none@1@S@Artificial intelligence communities have grown up around particular problems, institutions and researchers, as well as the theoretical insights that define the approaches described below.@@@@1@24@@oe@19-8-2009 10031390@unknown@formal@none@1@S@Artificial intelligence is a young science and is still a fragmented collection of subfields.@@@@1@14@@oe@19-8-2009 10031400@unknown@formal@none@1@S@At present, there is no established unifying theory that links the subfields into a coherent whole.@@@@1@16@@oe@19-8-2009 10031410@unknown@formal@none@1@S@==== Cybernetics and brain simulation ====@@@@1@6@@oe@19-8-2009 10031420@unknown@formal@none@1@S@In the 40s and 50s, a number of researchers explored the connection between [[neurology]], [[information theory]], and [[cybernetics]].@@@@1@18@@oe@19-8-2009 10031430@unknown@formal@none@1@S@Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as [[W. Grey Walter]]'s [[Turtle (robot)|turtles]] and the [[Johns Hopkins Beast]].@@@@1@25@@oe@19-8-2009 10031440@unknown@formal@none@1@S@Many of these researchers gathered for meetings of the [[Teleological Society]] at Princeton and the [[Ratio Club]] in England.@@@@1@19@@oe@19-8-2009 10031450@unknown@formal@none@1@S@==== Traditional symbolic AI ====@@@@1@5@@oe@19-8-2009 10031460@unknown@formal@none@1@S@When access to digital computers became possible in the middle 1950s, AI research began to explore the possibility that human intelligence could be reduced to symbol manipulation.@@@@1@27@@oe@19-8-2009 10031470@unknown@formal@none@1@S@The research was centered in three institutions: [[Carnegie Mellon University|CMU]], [[Stanford]] and [[MIT]], and each one developed its own style of research.@@@@1@22@@oe@19-8-2009 10031480@unknown@formal@none@1@S@[[John Haugeland]] named these approaches to AI "good old fashioned AI" or "[[GOFAI]]".@@@@1@13@@oe@19-8-2009 10031490@unknown@formal@none@1@S@; Cognitive simulation:@@@@1@3@@oe@19-8-2009 10031495@unknown@formal@none@1@S@[[Economist]] [[Herbert Simon]] and [[Alan Newell]] studied human problem solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as [[cognitive science]], [[operations research]] and [[management science]].@@@@1@38@@oe@19-8-2009 10031500@unknown@formal@none@1@S@Their research team performed [[psychology|psychological]] experiments to demonstrate the similarities between human problem solving and the programs (such as their "[[General Problem Solver]]") they were developing.@@@@1@26@@oe@19-8-2009 10031510@unknown@formal@none@1@S@This tradition, centered at [[Carnegie Mellon University]], would eventually culminate in the development of the [[Soar (cognitive architecture)|Soar]] architecture in the middle 80s.@@@@1@23@@oe@19-8-2009 10031520@unknown@formal@none@1@S@; Logical AI:@@@@1@3@@oe@19-8-2009 10031525@unknown@formal@none@1@S@Unlike [[Alan Newell|Newell]] and [[Herbert Simon|Simon]], [[John McCarthy (computer scientist)|John McCarthy]] felt that machines did not need to simulate human thought, but should instead try to find the essence of abstract reasoning and problem solving, regardless of whether people used the same algorithms.@@@@1@43@@oe@19-8-2009 10031530@unknown@formal@none@1@S@His laboratory at [[Stanford University|Stanford]] ([[Stanford Artificial Intelligence Laboratory|SAIL]]) focused on using formal [[logic]] to solve a wide variety of problems, including [[knowledge representation]], [[automated planning and scheduling|planning]] and [[machine learning|learning]].@@@@1@31@@oe@19-8-2009 10031540@unknown@formal@none@1@S@Work in logic led to the development of the programming language [[Prolog]] and the science of [[logic programming]].@@@@1@18@@oe@19-8-2009 10031550@unknown@formal@none@1@S@; "Scruffy" symbolic AI:@@@@1@4@@oe@19-8-2009 10031555@unknown@formal@none@1@S@Researchers at [[MIT]] (such as [[Marvin Minsky]] and [[Seymour Papert]]) found that solving difficult problems in [[computer vision|vision]] and [[natural language processing]] required ad-hoc solutions – they argued that there was no [[silver bullet|easy answer]], no simple and general principle (like [[logic]]) that would capture all the aspects of intelligent behavior.@@@@1@51@@oe@19-8-2009 10031560@unknown@formal@none@1@S@[[Roger Schank]] described their "anti-logic" approaches as "[[Neats vs. scruffies|scruffy]]" (as opposed to the "[[Neats vs. scruffies|neat]]" paradigms at [[CMU]] and [[Stanford]]), and this still forms the basis of research into [[commonsense knowledge bases]] (such as [[Doug Lenat]]'s [[Cyc]]) which must be built one complicated concept at a time.@@@@1@49@@oe@19-8-2009 10031570@unknown@formal@none@1@S@; Knowledge based AI:@@@@1@4@@oe@19-8-2009 10031575@unknown@formal@none@1@S@When computers with large memories became available around 1970, researchers from all three traditions began to build [[knowledge representation|knowledge]] into AI applications.@@@@1@22@@oe@19-8-2009 10031580@unknown@formal@none@1@S@This "knowledge revolution" led to the development and deployment of [[expert system]]s (introduced by [[Edward Feigenbaum]]), the first truly successful form of AI software.@@@@1@24@@oe@19-8-2009 10031590@unknown@formal@none@1@S@The knowledge revolution was also driven by the realization that truly enormous amounts of knowledge would be required by many simple AI applications.@@@@1@23@@oe@19-8-2009 10031600@unknown@formal@none@1@S@==== Sub-symbolic AI ====@@@@1@4@@oe@19-8-2009 10031610@unknown@formal@none@1@S@During the 1960s, symbolic approaches had achieved great success at simulating high-level thinking in small demonstration programs.@@@@1@17@@oe@19-8-2009 10031620@unknown@formal@none@1@S@Approaches based on [[cybernetics]] or [[neural network]]s were abandoned or pushed into the background.@@@@1@14@@oe@19-8-2009 10031630@unknown@formal@none@1@S@By the 1980s, however, progress in symbolic AI seemed to stall and many believed that symbolic systems would never be able to imitate all the processes of human cognition, especially [[machine perception|perception]], [[robotics]], [[machine learning|learning]] and [[pattern recognition]].@@@@1@38@@oe@19-8-2009 10031640@unknown@formal@none@1@S@A number of researchers began to look into "sub-symbolic" approaches to specific AI problems.@@@@1@14@@oe@19-8-2009 10031650@unknown@formal@none@1@S@; Bottom-up, situated, behavior based or nouvelle AI:@@@@1@8@@oe@19-8-2009 10031655@unknown@formal@none@1@S@Researchers from the related field of [[robotics]], such as [[Rodney Brooks]], rejected symbolic AI and focussed on the basic engineering problems that would allow robots to move and survive.@@@@1@29@@oe@19-8-2009 10031660@unknown@formal@none@1@S@Their work revived the non-symbolic viewpoint of the early [[cybernetic]]s researchers of the 50s and reintroduced the use of [[control theory]] in AI.@@@@1@23@@oe@19-8-2009 10031670@unknown@formal@none@1@S@These approaches are also conceptually related to the [[embodied mind thesis]].@@@@1@11@@oe@19-8-2009 10031680@unknown@formal@none@1@S@; Computational Intelligence:@@@@1@3@@oe@19-8-2009 10031685@unknown@formal@none@1@S@Interest in [[neural networks]] and "[[connectionism]]" was revived by [[David Rumelhart]] and others in the middle 1980s.@@@@1@17@@oe@19-8-2009 10031690@unknown@formal@none@1@S@These and other sub-symbolic approaches, such as [[fuzzy system]]s and [[evolutionary computation]], are now studied collectively by the emerging discipline of [[computational intelligence]].@@@@1@23@@oe@19-8-2009 10031700@unknown@formal@none@1@S@; The new neats:@@@@1@4@@oe@19-8-2009 10031705@unknown@formal@none@1@S@In the 1990s, AI researchers developed sophisticated mathematical tools to solve specific subproblems.@@@@1@13@@oe@19-8-2009 10031710@unknown@formal@none@1@S@These tools are truly [[scientific method|scientific]], in the sense that their results are both measurable and verifiable, and they have been responsible for many of AI's recent successes.@@@@1@28@@oe@19-8-2009 10031720@unknown@formal@none@1@S@The shared mathematical language has also permitted a high level of collaboration with more established fields (like [[mathematics]], [[economics]] or [[operations research]]).@@@@1@22@@oe@19-8-2009 10031725@unknown@formal@none@1@S@{{Harvtxt|Russell|Norvig|2003}} describe this movement as nothing less than a "revolution" and "the victory of the [[neats and scruffies|neats]]."@@@@1@18@@oe@19-8-2009 10031730@unknown@formal@none@1@S@==== Intelligent agent paradigm ====@@@@1@5@@oe@19-8-2009 10031740@unknown@formal@none@1@S@The "[[intelligent agent]]" [[paradigm]] became widely accepted during the 1990s.@@@@1@10@@oe@19-8-2009 10031750@unknown@formal@none@1@S@An [[intelligent agent]] is a system that perceives its [[agent environment|environment]] and takes actions which maximizes its chances of success.@@@@1@20@@oe@19-8-2009 10031760@unknown@formal@none@1@S@The simplest intelligent agents are programs that solve specific problems.@@@@1@10@@oe@19-8-2009 10031770@unknown@formal@none@1@S@The most complicated intelligent agents are rational, thinking human beings.@@@@1@10@@oe@19-8-2009 10031780@unknown@formal@none@1@S@The paradigm gives researchers license to study isolated problems and find solutions that are both verifiable and useful, without agreeing on one single approach.@@@@1@24@@oe@19-8-2009 10031790@unknown@formal@none@1@S@An agent that solves a specific problem can use any approach that works — some agents are symbolic and logical, some are sub-symbolic [[neural network]]s and others may use new approaches.@@@@1@31@@oe@19-8-2009 10031800@unknown@formal@none@1@S@The paradigm also gives researchers a common language to communicate with other fields—such as [[decision theory]] and [[economics]]—that also use concepts of abstract agents.@@@@1@24@@oe@19-8-2009 10031810@unknown@formal@none@1@S@==== Integrating the approaches ====@@@@1@5@@oe@19-8-2009 10031820@unknown@formal@none@1@S@An [[agent architecture]] or [[cognitive architecture]] allows researchers to build more versatile and intelligent systems out of interacting [[intelligent agents]] in a [[multi-agent system]].@@@@1@24@@oe@19-8-2009 10031830@unknown@formal@none@1@S@A system with both symbolic and sub-symbolic components is a [[hybrid intelligent system]], and the study of such systems is [[artificial intelligence systems integration]].@@@@1@24@@oe@19-8-2009 10031840@unknown@formal@none@1@S@A [[hierarchical control system]] provides a bridge between sub-symbolic AI at its lowest, reactive levels and traditional symbolic AI at its highest levels, where relaxed time constraints permit planning and world modelling.@@@@1@32@@oe@19-8-2009 10031850@unknown@formal@none@1@S@[[Rodney Brooks]]' [[subsumption architecture]] was an early proposal for such a hierarchical system.@@@@1@13@@oe@19-8-2009 10031860@unknown@formal@none@1@S@=== Tools of AI research ===@@@@1@6@@oe@19-8-2009 10031870@unknown@formal@none@1@S@In the course of 50 years of research, AI has developed a large number of tools to solve the most difficult problems in [[computer science]].@@@@1@25@@oe@19-8-2009 10031880@unknown@formal@none@1@S@A few of the most general of these methods are discussed below.@@@@1@12@@oe@19-8-2009 10031890@unknown@formal@none@1@S@==== Search ====@@@@1@3@@oe@19-8-2009 10031900@unknown@formal@none@1@S@Many problems in AI can be solved in theory by intelligently searching through many possible solutions: [[:#Deduction, reasoning, problem solving|Reasoning]] can be reduced to performing a search.@@@@1@27@@oe@19-8-2009 10031910@unknown@formal@none@1@S@For example, logical proof can be viewed as searching for a path that leads from [[premise]]s to [[conclusion]]s, where each step is the application of an [[inference rule]].@@@@1@28@@oe@19-8-2009 10031920@unknown@formal@none@1@S@[[Automated planning and scheduling|Planning]] algorithms search through trees of goals and subgoals, attempting to find a path to a target goal.@@@@1@21@@oe@19-8-2009 10031930@unknown@formal@none@1@S@[[Robotics]] algorithms for moving limbs and grasping objects use [[local search (optimization)|local searches]] in [[configuration space]].@@@@1@16@@oe@19-8-2009 10031940@unknown@formal@none@1@S@Many [[machine learning|learning]] algorithms have search at their core.@@@@1@9@@oe@19-8-2009 10031950@unknown@formal@none@1@S@There are several types of search algorithms:@@@@1@7@@oe@19-8-2009 10031960@unknown@formal@none@1@S@* "Uninformed" search algorithms eventually search through every possible answer until they locate their goal.@@@@1@15@@oe@19-8-2009 10031970@unknown@formal@none@1@S@Naive algorithms quickly run into problems when they expand the size of their [[search space]] to [[astronomical]] numbers.@@@@1@18@@oe@19-8-2009 10031980@unknown@formal@none@1@S@The result is a search that is [[Computation time|too slow]] or never completes.@@@@1@13@@oe@19-8-2009 10031990@unknown@formal@none@1@S@* [[Heuristic]] or "informed" searches use heuristic methods to eliminate choices that are unlikely to lead to their goal, thus drastically reducing the number of possibilities they must explore.@@@@1@29@@oe@19-8-2009 10032000@unknown@formal@none@1@S@The eliminatation of choices that are certain not to lead to the goal is called [[pruning (algorithm)|pruning]].@@@@1@17@@oe@19-8-2009 10032010@unknown@formal@none@1@S@* [[Local search (optimization)|Local searches]], such as [[hill climbing]], [[simulated annealing]] and [[beam search]], use techniques borrowed from [[optimization (mathematics)|optimization theory]].@@@@1@21@@oe@19-8-2009 10032020@unknown@formal@none@1@S@* [[Global optimization|Global searches]] are more robust in the presence of [[local optima]].@@@@1@13@@oe@19-8-2009 10032030@unknown@formal@none@1@S@Techniques include [[evolutionary algorithms]], [[swarm intelligence]] and [[random optimization]] algorithms.@@@@1@10@@oe@19-8-2009 10032040@unknown@formal@none@1@S@==== Logic ====@@@@1@3@@oe@19-8-2009 10032050@unknown@formal@none@1@S@[[Logic]] was introduced into AI research by [[John McCarthy (computer scientist)|John McCarthy]] in his 1958 [[Advice Taker]] proposal.@@@@1@18@@oe@19-8-2009 10032060@unknown@formal@none@1@S@The most important technical development was [[J. Alan Robinson]]'s discovery of the [[resolution (logic)|resolution]] and [[unification]] algorithm for logical deduction in 1963.@@@@1@22@@oe@19-8-2009 10032070@unknown@formal@none@1@S@This procedure is simple, complete and entirely algorithmic, and can easily be performed by digital computers.@@@@1@16@@oe@19-8-2009 10032080@unknown@formal@none@1@S@However, a naive implementation of the algorithm quickly leads to a [[combinatorial explosion]] or an [[infinite loop]].@@@@1@17@@oe@19-8-2009 10032090@unknown@formal@none@1@S@In 1974, [[Robert Kowalski]] suggested representing logical expressions as [[Horn clauses]] (statements in the form of rules: "if ''p'' then ''q''"), which reduced logical deduction to [[backward chaining]] or [[forward chaining]].@@@@1@31@@oe@19-8-2009 10032100@unknown@formal@none@1@S@This greatly alleviated (but did not eliminate) the problem.@@@@1@9@@oe@19-8-2009 10032110@unknown@formal@none@1@S@Logic is used for knowledge representation and problem solving, but it can be applied to other problems as well.@@@@1@19@@oe@19-8-2009 10032120@unknown@formal@none@1@S@For example, the [[satplan]] algorithm uses logic for [[automated planning and scheduling|planning]], and [[inductive logic programming]] is a method for [[machine learning|learning]].@@@@1@22@@oe@19-8-2009 10032130@unknown@formal@none@1@S@There are several different forms of logic used in AI research.@@@@1@11@@oe@19-8-2009 10032140@unknown@formal@none@1@S@* [[Propositional logic]] or [[sentential logic]] is the logic of statements which can be true or false.@@@@1@17@@oe@19-8-2009 10032150@unknown@formal@none@1@S@* [[First-order logic]] also allows the use of [[quantifier]]s and [[predicate]]s, and can express facts about objects, their properties, and their relations with each other.@@@@1@25@@oe@19-8-2009 10032160@unknown@formal@none@1@S@* [[Fuzzy logic]], a version of first-order logic which allows the truth of a statement to be represented as a value between 0 and 1, rather than simply True (1) or False (0).@@@@1@33@@oe@19-8-2009 10032170@unknown@formal@none@1@S@[[Fuzzy system]]s can be used for uncertain reasoning and have been widely used in modern industrial and consumer product control systems.@@@@1@21@@oe@19-8-2009 10032180@unknown@formal@none@1@S@* [[Default logic]]s, [[non-monotonic logic]]s and [[circumscription]] are forms of logic designed to help with default reasoning and the [[qualification problem]].@@@@1@21@@oe@19-8-2009 10032190@unknown@formal@none@1@S@* Several extensions of logic have been designed to handle specific domains of [[knowledge representation|knowledge]], such as: [[description logic]]s; [[situation calculus]], [[event calculus]] and [[fluent calculus]] (for representing events and time); [[Causality#causal calculus|causal calculus]]; [[belief calculus]]; and [[modal logic]]s.@@@@1@39@@oe@19-8-2009 10032200@unknown@formal@none@1@S@====Probabilistic methods for uncertain reasoning====@@@@1@5@@oe@19-8-2009 10032210@unknown@formal@none@1@S@Many problems in AI (in reasoning, planning, learning, perception and robotics) require the agent to operate with incomplete or uncertain information.@@@@1@21@@oe@19-8-2009 10032220@unknown@formal@none@1@S@Starting in the late 80s and early 90s, [[Judea Pearl]] and others championed the use of methods drawn from [[probability]] theory and [[economics]] to devise a number of powerful tools to solve these problems.@@@@1@34@@oe@19-8-2009 10032230@unknown@formal@none@1@S@[[Bayesian network]]s are very general tool that can be used for a large number of problems: reasoning (using the [[Bayesian inference]] algorithm), [[Machine learning|learning]] (using the [[expectation-maximization algorithm]]), [[Automated planning and scheduling|planning]] (using [[decision network]]s) and [[machine perception|perception]] (using [[dynamic Bayesian network]]s).@@@@1@42@@oe@19-8-2009 10032240@unknown@formal@none@1@S@Probabilistic algorithms can also be used for filtering, prediction, smoothing and finding explanations for streams of data, helping [[machine perception|perception]] systems to analyze processes that occur over time (e.g., [[hidden Markov model]]s and [[Kalman filter]]s).@@@@1@35@@oe@19-8-2009 10032250@unknown@formal@none@1@S@Planning problems have also taken advantages of other tools from economics, such as [[decision theory]] and [[decision analysis]], [[applied information economics|information value theory]], [[Markov decision process]]es, dynamic [[decision network]]s, [[game theory]] and [[mechanism design]]@@@@1@34@@oe@19-8-2009 10032260@unknown@formal@none@1@S@==== Classifiers and statistical learning methods ====@@@@1@7@@oe@19-8-2009 10032270@unknown@formal@none@1@S@The simplest AI applications can be divided into two types: classifiers ("if shiny then diamond") and controllers ("if shiny then pick up").@@@@1@22@@oe@19-8-2009 10032280@unknown@formal@none@1@S@Controllers do however also classify conditions before inferring actions, and therefore classification forms a central part of many AI systems.@@@@1@20@@oe@19-8-2009 10032290@unknown@formal@none@1@S@[[Classifier (mathematics)|Classifiers]] are functions that use [[pattern matching]] to determine a closest match.@@@@1@13@@oe@19-8-2009 10032300@unknown@formal@none@1@S@They can be tuned according to examples, making them very attractive for use in AI.@@@@1@15@@oe@19-8-2009 10032310@unknown@formal@none@1@S@These examples are known as observations or patterns.@@@@1@8@@oe@19-8-2009 10032320@unknown@formal@none@1@S@In supervised learning, each pattern belongs to a certain predefined class.@@@@1@11@@oe@19-8-2009 10032330@unknown@formal@none@1@S@A class can be seen as a decision that has to be made.@@@@1@13@@oe@19-8-2009 10032340@unknown@formal@none@1@S@All the observations combined with their class labels are known as a data set.@@@@1@14@@oe@19-8-2009 10032350@unknown@formal@none@1@S@When a new observation is received, that observation is classified based on previous experience.@@@@1@14@@oe@19-8-2009 10032360@unknown@formal@none@1@S@A classifier can be trained in various ways; there are many statistical and [[machine learning]] approaches.@@@@1@16@@oe@19-8-2009 10032370@unknown@formal@none@1@S@A wide range of classifiers are available, each with its strengths and weaknesses.@@@@1@13@@oe@19-8-2009 10032380@unknown@formal@none@1@S@Classifier performance depends greatly on the characteristics of the data to be classified.@@@@1@13@@oe@19-8-2009 10032390@unknown@formal@none@1@S@There is no single classifier that works best on all given problems; this is also referred to as the "no free lunch" theorem.@@@@1@23@@oe@19-8-2009 10032400@unknown@formal@none@1@S@Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance.@@@@1@21@@oe@19-8-2009 10032410@unknown@formal@none@1@S@Determining a suitable classifier for a given problem is however still more an art than science.@@@@1@16@@oe@19-8-2009 10032420@unknown@formal@none@1@S@The most widely used classifiers are the [[Artificial neural network|neural network]], [[kernel methods]] such as the [[support vector machine]], [[k-nearest neighbor algorithm]], [[Gaussian mixture model]], [[naive Bayes classifier]], and [[decision tree]].@@@@1@31@@oe@19-8-2009 10032430@unknown@formal@none@1@S@The performance of these classifiers have been compared over a wide range of classification tasks in order to find data characteristics that determine classifier performance.@@@@1@25@@oe@19-8-2009 10032440@unknown@formal@none@1@S@==== Neural networks ====@@@@1@4@@oe@19-8-2009 10032450@unknown@formal@none@1@S@The study of [[artificial neural network]]s began with [[cybernetic]]s researchers, working in the decade before the field AI research was founded.@@@@1@21@@oe@19-8-2009 10032460@unknown@formal@none@1@S@In the 1960s [[Frank Rosenblatt]] developed an important early version, the [[perceptron]].@@@@1@12@@oe@19-8-2009 10032470@unknown@formal@none@1@S@[[Paul Werbos]] developed the [[backpropagation]] algorithm for [[multilayer perceptron]]s in 1974, which led to a renaissance in neural network research and [[connectionism]] in general in the middle 1980s.@@@@1@28@@oe@19-8-2009 10032480@unknown@formal@none@1@S@Other common network architectures which have been developed include the [[feedforward neural network]], the [[radial basis network]], the Kohonen [[self-organizing map]] and various [[recurrent neural network]]s.@@@@1@26@@oe@19-8-2009 10032490@unknown@formal@none@1@S@The [[Hopfield net]], a form of attractor network, was first described by [[John Hopfield]] in 1982.@@@@1@16@@oe@19-8-2009 10032500@unknown@formal@none@1@S@Neural networks are applied to the problem of [[machine learning|learning]], using such techniques as [[Hebbian learning]] , [[Holographic associative memory]] and the relatively new field of [[Hierarchical Temporal Memory]] which simulates the architecture of the [[neocortex]].@@@@1@36@@oe@19-8-2009 10032510@unknown@formal@none@1@S@==== Social and emergent models ====@@@@1@6@@oe@19-8-2009 10032520@unknown@formal@none@1@S@Several algorithms for [[machine learning|learning]] use tools from [[evolutionary computation]], such as [[genetic algorithms]], [[swarm intelligence]]. and [[genetic programming]].@@@@1@19@@oe@19-8-2009 10032530@unknown@formal@none@1@S@==== Control theory ====@@@@1@4@@oe@19-8-2009 10032540@unknown@formal@none@1@S@[[Control theory]], the grandchild of [[cybernetics]], has many important applications, especially in [[robotics]].@@@@1@13@@oe@19-8-2009 10032550@unknown@formal@none@1@S@==== Specialized languages ====@@@@1@4@@oe@19-8-2009 10032560@unknown@formal@none@1@S@AI researchers have developed several specialized languages for AI research:@@@@1@10@@oe@19-8-2009 10032570@unknown@formal@none@1@S@* [[Information Processing Language|IPL]], one of the first programming languages, developed by [[Alan Newell]], [[Herbert Simon]] and [[J. C. Shaw]].@@@@1@20@@oe@19-8-2009 10032580@unknown@formal@none@1@S@* [[Lisp programming language|Lisp]] was developed by [[John McCarthy (computer scientist)|John McCarthy]] at [[MIT]] in 1958.@@@@1@16@@oe@19-8-2009 10032590@unknown@formal@none@1@S@There are many dialects of Lisp in use today.@@@@1@9@@oe@19-8-2009 10032600@unknown@formal@none@1@S@* [[Prolog]], a language based on [[logic programming]], was invented by [[France|French]] researchers [[Alain Colmerauer]] and [[Phillipe Roussel]], in collaboration with [[Robert Kowalski]] of the [[University of Edinburgh]].@@@@1@28@@oe@19-8-2009 10032610@unknown@formal@none@1@S@* [[STRIPS]], a planning language developed at [[Stanford]] in the 1960s.@@@@1@11@@oe@19-8-2009 10032620@unknown@formal@none@1@S@* [[Planner (programming language)|Planner]] developed at [[MIT]] around the same time.@@@@1@11@@oe@19-8-2009 10032630@unknown@formal@none@1@S@AI applications are also often written in standard languages like [[C++]] and languages designed for mathematics, such as [[Matlab]] and [[Lush (programming language)|Lush]].@@@@1@23@@oe@19-8-2009 10032640@unknown@formal@none@1@S@=== Evaluating artificial intelligence ===@@@@1@5@@oe@19-8-2009 10032650@unknown@formal@none@1@S@How can one determine if an agent is intelligent?@@@@1@9@@oe@19-8-2009 10032660@unknown@formal@none@1@S@In 1950, Alan Turing proposed a general procedure to test the intelligence of an agent now known as the [[Turing test]].@@@@1@21@@oe@19-8-2009 10032670@unknown@formal@none@1@S@This procedure allows almost all the major problems of artificial intelligence to be tested.@@@@1@14@@oe@19-8-2009 10032680@unknown@formal@none@1@S@However, it is a very difficult challenge and at present all agents fail.@@@@1@13@@oe@19-8-2009 10032690@unknown@formal@none@1@S@Artificial intelligence can also be evaluated on specific problems such as small problems in chemistry, hand-writing recognition and game-playing.@@@@1@19@@oe@19-8-2009 10032700@unknown@formal@none@1@S@Such tests have been termed [[subject matter expert Turing test]]s.@@@@1@10@@oe@19-8-2009 10032710@unknown@formal@none@1@S@Smaller problems provide more achievable goals and there are an ever-increasing number of positive results.@@@@1@15@@oe@19-8-2009 10032720@unknown@formal@none@1@S@The broad classes of outcome for an AI test are:@@@@1@10@@oe@19-8-2009 10032730@unknown@formal@none@1@S@* '''optimal''': it is not possible to perform better@@@@1@9@@oe@19-8-2009 10032740@unknown@formal@none@1@S@* '''strong super-human''': performs better than all humans@@@@1@8@@oe@19-8-2009 10032750@unknown@formal@none@1@S@* '''super-human''': performs better than most humans@@@@1@7@@oe@19-8-2009 10032760@unknown@formal@none@1@S@* '''sub-human''': performs worse than most humans@@@@1@7@@oe@19-8-2009 10032770@unknown@formal@none@1@S@For example, performance at checkers ([[draughts]]) is optimal, performance at chess is super-human and nearing strong super-human, and performance at many everyday tasks performed by humans is sub-human.@@@@1@28@@oe@19-8-2009 10032780@unknown@formal@none@1@S@=== Competitions and prizes ===@@@@1@5@@oe@19-8-2009 10032790@unknown@formal@none@1@S@There are a number of competitions and prizes to promote research in artificial intelligence.@@@@1@14@@oe@19-8-2009 10032800@unknown@formal@none@1@S@The main areas promoted are: general machine intelligence, conversational behaviour, data-mining, driverless cars, robot soccer and games.@@@@1@17@@oe@19-8-2009 10032810@unknown@formal@none@1@S@== Applications of artificial intelligence ==@@@@1@6@@oe@19-8-2009 10032820@unknown@formal@none@1@S@Artificial intelligence has successfully been used in a wide range of fields including [[medical diagnosis]], [[stock trading]], [[robot control]], [[law]], scientific discovery and toys.@@@@1@24@@oe@19-8-2009 10032830@unknown@formal@none@1@S@Frequently, when a technique reaches mainstream use it is no longer considered artificial intelligence, sometimes described as the [[AI effect]].@@@@1@20@@oe@19-8-2009 10032840@unknown@formal@none@1@S@It may also become integrated into [[artificial life]].@@@@1@8@@oe@19-8-2009 10040010@unknown@formal@none@1@S@
Artificial Linguistic Internet Computer Entity
@@@@1@5@@oe@19-8-2009 10040020@unknown@formal@none@1@S@'''A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)''' is an award-winning [[natural language processing]] [[chatterbot]]—a program that engages in a conversation with a human by applying some heuristical pattern matching rules to the human's input, and in its online form it also relies on a hidden third person.@@@@1@46@@oe@19-8-2009 10040030@unknown@formal@none@1@S@It was inspired by [[Joseph Weizenbaum]]'s classical [[ELIZA]] program.@@@@1@9@@oe@19-8-2009 10040040@unknown@formal@none@1@S@It is one of the strongest programs of its type and has won the [[Loebner Prize]], awarded to accomplished humanoid, talking robots, three times (in [[2000]], [[2001]] and [[2004]]).@@@@1@29@@oe@19-8-2009 10040050@unknown@formal@none@1@S@However, the program is unable to pass the [[Turing test]], as even the casual user will often expose its mechanistic aspects in short conversations.@@@@1@24@@oe@19-8-2009 10040060@unknown@formal@none@1@S@The name of the bot was chosen because the computer that ran the first version of the software was called Alice.@@@@1@21@@oe@19-8-2009 10040070@unknown@formal@none@1@S@== History ==@@@@1@3@@oe@19-8-2009 10040080@unknown@formal@none@1@S@Development began in [[1995]].@@@@1@4@@oe@19-8-2009 10040090@unknown@formal@none@1@S@The program was rewritten in [[Java (programming language)|Java]] beginning in [[1998]].@@@@1@11@@oe@19-8-2009 10040100@unknown@formal@none@1@S@The current incarnation of the Java implementation is Program D.@@@@1@10@@oe@19-8-2009 10040110@unknown@formal@none@1@S@The program uses an [[XML Schema (W3C)|XML Schema]] called [[AIML]] (Artificial Intelligence Markup Language) for specifying the heuristic conversation rules.@@@@1@20@@oe@19-8-2009 10040120@unknown@formal@none@1@S@The original author of the project was [[Richard Wallace (scientist)|Richard Wallace]], but subsequent to the 2001 publication of an AIML specification, numerous other developers have taken up where Wallace left off, implementing [[free and open source software]] AIML interpreters in a variety of programming languages, publishing AIML sets in various human languages, and continuing the spread of the technology as a free/open source venture.@@@@1@64@@oe@19-8-2009 10040130@unknown@formal@none@1@S@== Example of conversation ==@@@@1@5@@oe@19-8-2009 10040140@unknown@formal@none@1@S@The following is part of a transcript recording the conversation between a human and A.L.I.C.E. and illustrates the capabilities and short-comings of the chatbot.@@@@1@24@@oe@19-8-2009 10040150@unknown@formal@none@1@S@: '''Human:''' Tell me about yourself alice@@@@1@7@@oe@19-8-2009 10040160@unknown@formal@none@1@S@: '''ALICE:''' I am an artificial linguistic entity.@@@@1@8@@oe@19-8-2009 10040170@unknown@formal@none@1@S@I was activated at Bethlehem, Pennsylvania, on November 23, 1995.@@@@1@10@@oe@19-8-2009 10040180@unknown@formal@none@1@S@My instructor was Dr. Richard S. Wallace.@@@@1@7@@oe@19-8-2009 10040190@unknown@formal@none@1@S@He taught me to sing a song.@@@@1@7@@oe@19-8-2009 10040200@unknown@formal@none@1@S@Would you like me to sing it for you?.@@@@1@9@@oe@19-8-2009 10040210@unknown@formal@none@1@S@: '''Human:''' Yes please@@@@1@4@@oe@19-8-2009 10040220@unknown@formal@none@1@S@: '''ALICE:''' Your polite style is very nice.@@@@1@8@@oe@19-8-2009 10040230@unknown@formal@none@1@S@It's called "Daisy."@@@@1@3@@oe@19-8-2009 10040240@unknown@formal@none@1@S@Daisy, Daisy.@@@@1@2@@oe@19-8-2009 10040250@unknown@formal@none@1@S@Give me your answer do.@@@@1@5@@oe@19-8-2009 10040260@unknown@formal@none@1@S@I am half crazy.@@@@1@4@@oe@19-8-2009 10040270@unknown@formal@none@1@S@All for the love of you.@@@@1@6@@oe@19-8-2009 10040280@unknown@formal@none@1@S@It won't be a stylish marriage.@@@@1@6@@oe@19-8-2009 10040290@unknown@formal@none@1@S@I can't afford a carriage.@@@@1@5@@oe@19-8-2009 10040300@unknown@formal@none@1@S@But you'll look sweet.@@@@1@4@@oe@19-8-2009 10040310@unknown@formal@none@1@S@Upon the seat.@@@@1@3@@oe@19-8-2009 10040320@unknown@formal@none@1@S@Of a bicycle built for two.@@@@1@6@@oe@19-8-2009 10040330@unknown@formal@none@1@S@: '''Human:''' That reminds me of HAL in "2001 a space odyssey"@@@@1@12@@oe@19-8-2009 10040340@unknown@formal@none@1@S@: '''ALICE:''' "Your polite style is very nice.@@@@1@8@@oe@19-8-2009 10040350@unknown@formal@none@1@S@It's called "Daisy."@@@@1@3@@oe@19-8-2009 10040360@unknown@formal@none@1@S@Daisy, Daisy.@@@@1@2@@oe@19-8-2009 10040370@unknown@formal@none@1@S@Give me your answer do.@@@@1@5@@oe@19-8-2009 10040380@unknown@formal@none@1@S@I am half crazy.@@@@1@4@@oe@19-8-2009 10040390@unknown@formal@none@1@S@All for the love of you.@@@@1@6@@oe@19-8-2009 10040400@unknown@formal@none@1@S@It won't be a stylish marriage.@@@@1@6@@oe@19-8-2009 10040410@unknown@formal@none@1@S@I can't afford a carriage.@@@@1@5@@oe@19-8-2009 10040420@unknown@formal@none@1@S@But you'll look sweet.@@@@1@4@@oe@19-8-2009 10040430@unknown@formal@none@1@S@Upon the seat.@@@@1@3@@oe@19-8-2009 10040440@unknown@formal@none@1@S@Of a bicycle built for two." reminds you of HAL in 2001 a space odyssey ?@@@@1@16@@oe@19-8-2009 10040450@unknown@formal@none@1@S@Does that make sense?@@@@1@4@@oe@19-8-2009 10040460@unknown@formal@none@1@S@: '''Human:''' Can you tell me something about Stanley Kubrick?@@@@1@10@@oe@19-8-2009 10040470@unknown@formal@none@1@S@: '''ALICE:''' Ambiguous: Ask who, what or where is Stanley Kubrick ?@@@@1@12@@oe@19-8-2009 10050010@unknown@formal@none@1@S@
Artificial neural network
@@@@1@3@@danf@17-8-2009 10050020@unknown@formal@none@1@S@An '''artificial neural network (ANN)''', often just called a "neural network" (NN), is a [[mathematical model]] or [[computational model]] based on [[biological neural networks]].@@@@1@24@@danf@17-8-2009 10050030@unknown@formal@none@1@S@It consists of an interconnected group of [[artificial neuron]]s and processes information using a [[connectionism|connectionist]] approach to [[computation]].@@@@1@18@@danf@17-8-2009 10050040@unknown@formal@none@1@S@In most cases an ANN is an [[adaptive system]] that changes its structure based on external or internal information that flows through the network during the learning phase.@@@@1@28@@danf@17-8-2009 10050050@unknown@formal@none@1@S@In more practical terms neural networks are [[non-linear]] [[statistical]] [[data modeling]] tools.@@@@1@12@@danf@17-8-2009 10050060@unknown@formal@none@1@S@They can be used to model complex relationships between inputs and outputs or to [[Pattern recognition|find patterns]] in data.@@@@1@19@@danf@17-8-2009 10050070@unknown@formal@none@1@S@==Background==@@@@1@1@@danf@17-8-2009 10050080@unknown@formal@none@1@S@There is no precise agreed-upon definition among researchers as to what a [[neural network]] is, but most would agree that it involves a network of simple processing elements ([[artificial neuron|neurons]]), which can exhibit complex global behavior, determined by the connections between the processing elements and element parameters.@@@@1@47@@danf@17-8-2009 10050090@unknown@formal@none@1@S@The original inspiration for the technique was from examination of the [[central nervous system]] and the neurons (and their [[axons]], [[dendrites]] and [[synapses]]) which constitute one of its most significant information processing elements (see [[Neuroscience]]).@@@@1@35@@danf@17-8-2009 10050100@unknown@formal@none@1@S@In a neural network model, simple [[Node (neural networks)|nodes]] (called variously "neurons", "neurodes", "PEs" ("processing elements") or "units") are connected together to form a network of nodes — hence the term "neural network."@@@@1@33@@danf@17-8-2009 10050110@unknown@formal@none@1@S@While a neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.@@@@1@36@@danf@17-8-2009 10050120@unknown@formal@none@1@S@These networks are also similar to the [[biological neural networks]] in the sense that functions are performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks to which various units are assigned (see also [[connectionism]]).@@@@1@42@@danf@17-8-2009 10050130@unknown@formal@none@1@S@Currently, the term Artificial Neural Network (ANN) tends to refer mostly to neural network models employed in [[statistics]], [[cognitive psychology]] and [[artificial intelligence]].@@@@1@23@@danf@17-8-2009 10050140@unknown@formal@none@1@S@[[Neural network]] models designed with emulation of the [[central nervous system]] (CNS) in mind are a subject of [[theoretical neuroscience]] ([[computational neuroscience]]).@@@@1@22@@danf@17-8-2009 10050150@unknown@formal@none@1@S@In modern [[Neural network software|software implementations]] of artificial neural networks the approach inspired by biology has more or less been abandoned for a more practical approach based on statistics and signal processing.@@@@1@32@@danf@17-8-2009 10050160@unknown@formal@none@1@S@In some of these systems neural networks, or parts of neural networks (such as [[artificial neuron]]s) are used as components in larger systems that combine both adaptive and non-adaptive elements.@@@@1@30@@danf@17-8-2009 10050170@unknown@formal@none@1@S@While the more general approach of such [[adaptive systems]] is more suitable for real-world problem solving, it has far less to do with the traditional artificial intelligence connectionist models.@@@@1@29@@danf@17-8-2009 10050180@unknown@formal@none@1@S@What they do, however, have in common is the principle of non-linear, distributed, parallel and local processing and adaptation.@@@@1@19@@danf@17-8-2009 10050190@unknown@formal@none@1@S@===Models===@@@@1@1@@danf@17-8-2009 10050200@unknown@formal@none@1@S@Neural network models in artificial intelligence are usually referred to as artificial neural networks (ANNs); these are essentially simple mathematical models defining a function f : X \\rightarrow Y .@@@@1@31@@danf@17-8-2009 10050210@unknown@formal@none@1@S@Each type of ANN model corresponds to a ''class'' of such functions.@@@@1@12@@danf@17-8-2009 10050220@unknown@formal@none@1@S@====The ''network'' in ''artificial neural network''====@@@@1@6@@danf@17-8-2009 10050230@unknown@formal@none@1@S@The word ''network'' in the term 'artificial neural network' arises because the function f(x) is defined as a composition of other functions g_i(x), which can further be defined as a composition of other functions.@@@@1@34@@danf@17-8-2009 10050240@unknown@formal@none@1@S@This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables.@@@@1@16@@danf@17-8-2009 10050250@unknown@formal@none@1@S@A widely used type of composition is the ''nonlinear weighted sum'', where f (x) = K \\left(\\sum_i w_i g_i(x)\\right) , where K is some predefined function, such as the [[hyperbolic tangent]].@@@@1@31@@danf@17-8-2009 10050260@unknown@formal@none@1@S@It will be convenient for the following to refer to a collection of functions g_i as simply a vector g = (g_1, g_2, \\ldots, g_n).@@@@1@25@@danf@17-8-2009 10050270@unknown@formal@none@1@S@This figure depicts such a decomposition of f, with dependencies between variables indicated by arrows.@@@@1@15@@danf@17-8-2009 10050280@unknown@formal@none@1@S@These can be interpreted in two ways.@@@@1@7@@danf@17-8-2009 10050290@unknown@formal@none@1@S@The first view is the functional view: the input x is transformed into a 3-dimensional vector h, which is then transformed into a 2-dimensional vector g, which is finally transformed into f.@@@@1@32@@danf@17-8-2009 10050300@unknown@formal@none@1@S@This view is most commonly encountered in the context of [[Optimization (mathematics)|optimization]].@@@@1@12@@danf@17-8-2009 10050310@unknown@formal@none@1@S@The second view is the probabilistic view: the [[random variable]] F = f(G) depends upon the random variable G = g(H), which depends upon H=h(X), which depends upon the random variable X.@@@@1@33@@danf@17-8-2009 10050320@unknown@formal@none@1@S@This view is most commonly encountered in the context of [[graphical models]].@@@@1@12@@danf@17-8-2009 10050330@unknown@formal@none@1@S@The two views are largely equivalent.@@@@1@6@@danf@17-8-2009 10050340@unknown@formal@none@1@S@In either case, for this particular network architecture, the components of individual layers are independent of each other (e.g., the components of g are independent of each other given their input h).@@@@1@32@@danf@17-8-2009 10050350@unknown@formal@none@1@S@This naturally enables a degree of parallelism in the implementation.@@@@1@10@@danf@17-8-2009 10050360@unknown@formal@none@1@S@Networks such as the previous one are commonly called [[feedforward]], because their graph is a [[directed acyclic graph]].@@@@1@18@@danf@17-8-2009 10050370@unknown@formal@none@1@S@Networks with [[path (graph theory)|cycles]] are commonly called [[Recurrent_neural_network|recurrent]].@@@@1@9@@danf@17-8-2009 10050380@unknown@formal@none@1@S@Such networks are commonly depicted in the manner shown at the top of the figure, where f is shown as being dependent upon itself.@@@@1@24@@danf@17-8-2009 10050390@unknown@formal@none@1@S@However, there is an implied temporal dependence which is not shown.@@@@1@11@@danf@17-8-2009 10050400@unknown@formal@none@1@S@What this actually means in practice is that the value of f at some point in time t depends upon the values of f at zero or at one or more other points in time.@@@@1@35@@danf@17-8-2009 10050410@unknown@formal@none@1@S@The graphical model at the bottom of the figure illustrates the case: the value of f at time t only depends upon its last value.@@@@1@25@@danf@17-8-2009 10050420@unknown@formal@none@1@S@===Learning===@@@@1@1@@danf@17-8-2009 10050430@unknown@formal@none@1@S@However interesting such functions may be in themselves, what has attracted the most interest in neural networks is the possibility of ''learning'', which in practice means the following:@@@@1@28@@danf@17-8-2009 10050440@unknown@formal@none@1@S@Given a specific ''task'' to solve, and a ''class'' of functions F, learning means using a set of ''observations'', in order to find f^* \\in F which solves the task in an ''optimal sense''.@@@@1@34@@danf@17-8-2009 10050450@unknown@formal@none@1@S@This entails defining a [[cost function]] C : F \\rightarrow \\mathbb{R} such that, for the optimal solution f^*, C(f^*) \\leq C(f) \\forall f \\in F (no solution has a cost less than the cost of the optimal solution).@@@@1@38@@danf@17-8-2009 10050460@unknown@formal@none@1@S@The [[cost function]] C is an important concept in learning, as it is a measure of how far away we are from an optimal solution to the problem that we want to solve.@@@@1@33@@danf@17-8-2009 10050470@unknown@formal@none@1@S@Learning algorithms search through the solution space in order to find a function that has the smallest possible cost.@@@@1@19@@danf@17-8-2009 10050480@unknown@formal@none@1@S@For applications where the solution is dependent on some data, the cost must necessarily be a ''function of the observations'', otherwise we would not be modelling anything related to the data.@@@@1@31@@danf@17-8-2009 10050490@unknown@formal@none@1@S@It is frequently defined as a [[statistic]] to which only approximations can be made.@@@@1@14@@danf@17-8-2009 10050500@unknown@formal@none@1@S@As a simple example consider the problem of finding the model f which minimizes C=E\\left[(f(x) - y)^2\\right], for data pairs (x,y) drawn from some distribution \\mathcal{D}.@@@@1@26@@danf@17-8-2009 10050510@unknown@formal@none@1@S@In practical situations we would only have N samples from \\mathcal{D} and thus, for the above example, we would only minimize \\hat{C}=\\frac{1}{N}\\sum_{i=1}^N (f(x_i)-y_i)^2.@@@@1@23@@danf@17-8-2009 10050520@unknown@formal@none@1@S@Thus, the cost is minimized over a sample of the data rather than the true data distribution.@@@@1@17@@danf@17-8-2009 10050530@unknown@formal@none@1@S@When N \\rightarrow \\infty some form of online learning must be used, where the cost is partially minimized as each new example is seen.@@@@1@24@@danf@17-8-2009 10050540@unknown@formal@none@1@S@While online learning is often used when \\mathcal{D} is fixed, it is most useful in the case where the distribution changes slowly over time.@@@@1@24@@danf@17-8-2009 10050550@unknown@formal@none@1@S@In neural network methods, some form of online learning is frequently also used for finite datasets.@@@@1@16@@danf@17-8-2009 10050560@unknown@formal@none@1@S@====Choosing a cost function====@@@@1@4@@danf@17-8-2009 10050570@unknown@formal@none@1@S@While it is possible to arbitrarily define some [[ad hoc]] cost function, frequently a particular cost will be used either because it has desirable properties (such as convexity) or because it arises naturally from a particular formulation of the problem (i.e., In a probabilistic formulation the posterior probability of the model can be used as an inverse cost).@@@@1@58@@danf@17-8-2009 10050580@unknown@formal@none@1@S@'''Ultimately, the cost function will depend on the task we wish to perform'''.@@@@1@13@@danf@17-8-2009 10050590@unknown@formal@none@1@S@The three main categories of learning tasks are overviewed below.@@@@1@10@@danf@17-8-2009 10050600@unknown@formal@none@1@S@===Learning paradigms===@@@@1@2@@danf@17-8-2009 10050610@unknown@formal@none@1@S@There are three major learning paradigms, each corresponding to a particular abstract learning task.@@@@1@14@@danf@17-8-2009 10050620@unknown@formal@none@1@S@These are [[supervised learning]], [[unsupervised learning]] and [[reinforcement learning]].@@@@1@9@@danf@17-8-2009 10050630@unknown@formal@none@1@S@Usually any given type of network architecture can be employed in any of those tasks.@@@@1@15@@danf@17-8-2009 10050640@unknown@formal@none@1@S@====Supervised learning====@@@@1@2@@danf@17-8-2009 10050650@unknown@formal@none@1@S@In [[supervised learning]], we are given a set of example pairs (x, y), x \\in X, y \\in Y and the aim is to find a function f : X \\rightarrow Y in the allowed class of functions that matches the examples.@@@@1@45@@danf@17-8-2009 10050660@unknown@formal@none@1@S@In other words, we wish to ''infer'' the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.@@@@1@37@@danf@17-8-2009 10050670@unknown@formal@none@1@S@A commonly used cost is the [[mean-squared error]] which tries to minimize the average error between the network's output, f(x), and the target value y over all the example pairs.@@@@1@30@@danf@17-8-2009 10050680@unknown@formal@none@1@S@When one tries to minimise this cost using [[gradient descent]] for the class of neural networks called [[Multilayer perceptron|Multi-Layer Perceptrons]], one obtains the common and well-known [[Backpropagation|backpropagation algorithm]] for training neural networks.@@@@1@32@@danf@17-8-2009 10050690@unknown@formal@none@1@S@Tasks that fall within the paradigm of supervised learning are [[pattern recognition]] (also known as classification) and [[Regression analysis|regression]] (also known as function approximation).@@@@1@24@@danf@17-8-2009 10050700@unknown@formal@none@1@S@The supervised learning paradigm is also applicable to sequential data (e.g., for speech and gesture recognition).@@@@1@16@@danf@17-8-2009 10050710@unknown@formal@none@1@S@This can be thought of as learning with a "teacher," in the form of a function that provides continuous feedback on the quality of solutions obtained thus far.@@@@1@28@@danf@17-8-2009 10050720@unknown@formal@none@1@S@====Unsupervised learning====@@@@1@2@@danf@17-8-2009 10050730@unknown@formal@none@1@S@In [[unsupervised learning]] we are given some data x, and the cost function to be minimized can be any function of the data x and the network's output, f.@@@@1@29@@danf@17-8-2009 10050740@unknown@formal@none@1@S@The cost function is dependent on the task (what we are trying to model) and our ''a priori'' assumptions (the implicit properties of our model, its parameters and the observed variables).@@@@1@31@@danf@17-8-2009 10050750@unknown@formal@none@1@S@As a trivial example, consider the model f(x) = a, where a is a constant and the cost C=E[(x - f(x))^2].@@@@1@21@@danf@17-8-2009 10050760@unknown@formal@none@1@S@Minimizing this cost will give us a value of a that is equal to the mean of the data.@@@@1@19@@danf@17-8-2009 10050770@unknown@formal@none@1@S@The cost function can be much more complicated.@@@@1@8@@danf@17-8-2009 10050780@unknown@formal@none@1@S@Its form depends on the application: For example in compression it could be related to the [[mutual information]] between x and y.@@@@1@22@@danf@17-8-2009 10050790@unknown@formal@none@1@S@In statistical modelling, it could be related to the [[posterior probability]] of the model given the data.@@@@1@17@@danf@17-8-2009 10050800@unknown@formal@none@1@S@(Note that in both of those examples those quantities would be maximized rather than minimised).@@@@1@15@@danf@17-8-2009 10050810@unknown@formal@none@1@S@Tasks that fall within the paradigm of unsupervised learning are in general [[estimation]] problems; the applications include [[Data clustering|clustering]], the estimation of [[statistical distributions]], [[Data compression|compression]] and [[Bayesian spam filtering|filtering]].@@@@1@30@@danf@17-8-2009 10050820@unknown@formal@none@1@S@====Reinforcement learning====@@@@1@2@@danf@17-8-2009 10050830@unknown@formal@none@1@S@In [[reinforcement learning]], data x is usually not given, but generated by an agent's interactions with the environment.@@@@1@18@@danf@17-8-2009 10050840@unknown@formal@none@1@S@At each point in time t, the agent performs an action y_t and the environment generates an observation x_t and an instantaneous cost c_t, according to some (usually unknown) dynamics.@@@@1@30@@danf@17-8-2009 10050850@unknown@formal@none@1@S@The aim is to discover a ''policy'' for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost.@@@@1@23@@danf@17-8-2009 10050860@unknown@formal@none@1@S@The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated.@@@@1@17@@danf@17-8-2009 10050870@unknown@formal@none@1@S@More formally, the environment is modeled as a [[Markov decision process]] (MDP) with states {s_1,...,s_n}\\in S and actions {a_1,...,a_m} \\in A with the following probability distributions: the instantaneous cost distribution P(c_t|s_t), the observation distribution P(x_t|s_t) and the transition P(s_{t+1}|s_t, a_t), while a policy is defined as conditional distribution over actions given the observations.@@@@1@53@@danf@17-8-2009 10050880@unknown@formal@none@1@S@Taken together, the two define a [[Markov chain]] (MC).@@@@1@9@@danf@17-8-2009 10050890@unknown@formal@none@1@S@The aim is to discover the policy that minimizes the cost, i.e. the MC for which the cost is minimal.@@@@1@20@@danf@17-8-2009 10050900@unknown@formal@none@1@S@ANNs are frequently used in reinforcement learning as part of the overall algorithm.@@@@1@13@@danf@17-8-2009 10050910@unknown@formal@none@1@S@Tasks that fall within the paradigm of reinforcement learning are control problems, [[game|games]] and other [[sequential decision making]] tasks.@@@@1@19@@danf@17-8-2009 10050920@unknown@formal@none@1@S@See also: [[dynamic programming]], [[stochastic control]]@@@@1@6@@danf@17-8-2009 10050930@unknown@formal@none@1@S@===Learning algorithms===@@@@1@2@@danf@17-8-2009 10050940@unknown@formal@none@1@S@Training a neural network model essentially means selecting one model from the set of allowed models (or, in a [[Bayesian]] framework, determining a distribution over the set of allowed models) that minimises the cost criterion.@@@@1@35@@danf@17-8-2009 10050950@unknown@formal@none@1@S@There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of [[Optimization (mathematics)|optimization]] theory and [[statistical estimation]].@@@@1@27@@danf@17-8-2009 10050960@unknown@formal@none@1@S@Most of the algorithms used in training artificial neural networks are employing some form of [[gradient descent]].@@@@1@17@@danf@17-8-2009 10050970@unknown@formal@none@1@S@This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a [[gradient-related]] direction.@@@@1@27@@danf@17-8-2009 10050980@unknown@formal@none@1@S@[[Evolutionary methods]], [[simulated annealing]], and [[expectation-maximization]] and [[non-parametric methods]] are among other commonly used methods for training neural networks.@@@@1@19@@danf@17-8-2009 10050990@unknown@formal@none@1@S@See also [[machine learning]].@@@@1@4@@danf@17-8-2009 10051000@unknown@formal@none@1@S@Temporal perceptual learning rely on finding temporal relationships in sensory signal streams.@@@@1@12@@danf@17-8-2009 10051010@unknown@formal@none@1@S@In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals.@@@@1@18@@danf@17-8-2009 10051020@unknown@formal@none@1@S@This is done by the [[perceptual network]].@@@@1@7@@danf@17-8-2009 10051030@unknown@formal@none@1@S@==Employing artificial neural networks==@@@@1@4@@danf@17-8-2009 10051040@unknown@formal@none@1@S@Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism which 'learns' from observed data.@@@@1@23@@danf@17-8-2009 10051050@unknown@formal@none@1@S@However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential.@@@@1@18@@danf@17-8-2009 10051060@unknown@formal@none@1@S@*Choice of model: This will depend on the data representation and the application.@@@@1@13@@danf@17-8-2009 10051070@unknown@formal@none@1@S@Overly complex models tend to lead to problems with learning.@@@@1@10@@danf@17-8-2009 10051080@unknown@formal@none@1@S@*Learning algorithm: There are numerous tradeoffs between learning algorithms.@@@@1@9@@danf@17-8-2009 10051090@unknown@formal@none@1@S@Almost any algorithm will work well with the ''correct [[hyperparameter]]s'' for training on a particular fixed dataset.@@@@1@17@@danf@17-8-2009 10051100@unknown@formal@none@1@S@However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation.@@@@1@17@@danf@17-8-2009 10051110@unknown@formal@none@1@S@*Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust.@@@@1@19@@danf@17-8-2009 10051120@unknown@formal@none@1@S@With the correct implementation ANNs can be used naturally in [[online algorithm|online learning]] and large dataset applications.@@@@1@17@@danf@17-8-2009 10051130@unknown@formal@none@1@S@Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware.@@@@1@21@@danf@17-8-2009 10051140@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10051150@unknown@formal@none@1@S@The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations.@@@@1@22@@danf@17-8-2009 10051160@unknown@formal@none@1@S@This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.@@@@1@24@@danf@17-8-2009 10051170@unknown@formal@none@1@S@===Real life applications===@@@@1@3@@danf@17-8-2009 10051180@unknown@formal@none@1@S@The tasks to which artificial neural networks are applied tend to fall within the following broad categories:@@@@1@17@@danf@17-8-2009 10051190@unknown@formal@none@1@S@*[[Function approximation]], or [[regression analysis]], including [[time series prediction]] and modeling.@@@@1@11@@danf@17-8-2009 10051200@unknown@formal@none@1@S@*[[Statistical classification|Classification]], including [[Pattern recognition|pattern]] and sequence recognition, [[novelty detection]] and sequential decision making.@@@@1@14@@danf@17-8-2009 10051210@unknown@formal@none@1@S@*[[Data processing]], including filtering, clustering, blind source separation and compression.@@@@1@10@@danf@17-8-2009 10051220@unknown@formal@none@1@S@Application areas include system identification and control (vehicle control, process control), game-playing and decision making (backgammon, chess, racing), pattern recognition (radar systems, face identification, object recognition and more), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial applications (automated trading systems), [[data mining]] (or knowledge discovery in databases, "KDD"), visualization and [[e-mail spam]] filtering.@@@@1@55@@danf@17-8-2009 10051230@unknown@formal@none@1@S@==Neural network software==@@@@1@3@@danf@17-8-2009 10051240@unknown@formal@none@1@S@'''Neural network software''' is used to [[Simulation|simulate]], [[research]], [[technology development|develop]] and apply artificial neural networks, [[biological neural network]]s and in some cases a wider array of [[adaptive system]]s.@@@@1@28@@danf@17-8-2009 10051250@unknown@formal@none@1@S@See also [[logistic regression]].@@@@1@4@@danf@17-8-2009 10051260@unknown@formal@none@1@S@==Types of neural networks==@@@@1@4@@danf@17-8-2009 10051270@unknown@formal@none@1@S@===Feedforward neural network===@@@@1@3@@danf@17-8-2009 10051280@unknown@formal@none@1@S@The feedforward neural network was the first and arguably simplest type of artificial neural network devised.@@@@1@16@@danf@17-8-2009 10051290@unknown@formal@none@1@S@In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes.@@@@1@26@@danf@17-8-2009 10051300@unknown@formal@none@1@S@There are no cycles or loops in the network.@@@@1@9@@danf@17-8-2009 10051310@unknown@formal@none@1@S@===Radial basis function (RBF) network===@@@@1@5@@danf@17-8-2009 10051320@unknown@formal@none@1@S@Radial Basis Functions are powerful techniques for interpolation in multidimensional space.@@@@1@11@@danf@17-8-2009 10051330@unknown@formal@none@1@S@A RBF is a function which has built into a distance criterion with respect to a centre.@@@@1@17@@danf@17-8-2009 10051340@unknown@formal@none@1@S@Radial basis functions have been applied in the area of neural networks where they may be used as a replacement for the sigmoidal hidden layer transfer characteristic in Multi-Layer Perceptrons.@@@@1@30@@danf@17-8-2009 10051350@unknown@formal@none@1@S@RBF networks have two layers of processing: In the first, input is mapped onto each RBF in the 'hidden' layer.@@@@1@20@@danf@17-8-2009 10051360@unknown@formal@none@1@S@The RBF chosen is usually a Gaussian.@@@@1@7@@danf@17-8-2009 10051370@unknown@formal@none@1@S@In regression problems the output layer is then a linear combination of hidden layer values representing mean predicted output.@@@@1@19@@danf@17-8-2009 10051380@unknown@formal@none@1@S@The interpretation of this output layer value is the same as a regression model in statistics.@@@@1@16@@danf@17-8-2009 10051390@unknown@formal@none@1@S@In classification problems the output layer is typically a [[sigmoid function]] of a linear combination of hidden layer values, representing a posterior probability.@@@@1@23@@danf@17-8-2009 10051400@unknown@formal@none@1@S@Performance in both cases is often improved by shrinkage techniques, known as [[ridge regression]] in classical statistics and known to correspond to a prior belief in small parameter values (and therefore smooth output functions) in a Bayesian framework.@@@@1@38@@danf@17-8-2009 10051410@unknown@formal@none@1@S@RBF networks have the advantage of not suffering from local minima in the same way as Multi-Layer Perceptrons.@@@@1@18@@danf@17-8-2009 10051420@unknown@formal@none@1@S@This is because the only parameters that are adjusted in the learning process are the linear mapping from hidden layer to output layer.@@@@1@23@@danf@17-8-2009 10051430@unknown@formal@none@1@S@Linearity ensures that the error surface is quadratic and therefore has a single easily found minimum.@@@@1@16@@danf@17-8-2009 10051440@unknown@formal@none@1@S@In regression problems this can be found in one matrix operation.@@@@1@11@@danf@17-8-2009 10051450@unknown@formal@none@1@S@In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using [[iteratively re-weighted least squares]].@@@@1@22@@danf@17-8-2009 10051460@unknown@formal@none@1@S@RBF networks have the disadvantage of requiring good coverage of the input space by radial basis functions.@@@@1@17@@danf@17-8-2009 10051470@unknown@formal@none@1@S@RBF centres are determined with reference to the distribution of the input data, but without reference to the prediction task.@@@@1@20@@danf@17-8-2009 10051480@unknown@formal@none@1@S@As a result, representational resources may be wasted on areas of the input space that are irrelevant to the learning task.@@@@1@21@@danf@17-8-2009 10051490@unknown@formal@none@1@S@A common solution is to associate each data point with its own centre, although this can make the linear system to be solved in the final layer rather large, and requires shrinkage techniques to avoid overfitting.@@@@1@36@@danf@17-8-2009 10051500@unknown@formal@none@1@S@Associating each input datum with an RBF leads naturally to kernel methods such as [[Support Vector Machine]]s and Gaussian Processes (the RBF is the kernel function).@@@@1@26@@danf@17-8-2009 10051510@unknown@formal@none@1@S@All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model.@@@@1@27@@danf@17-8-2009 10051520@unknown@formal@none@1@S@Like Gaussian Processes, and unlike SVMs, RBF networks are typically trained in a Maximum Likelihood framework by maximizing the probability (minimizing the error) of the data under the model.@@@@1@29@@danf@17-8-2009 10051530@unknown@formal@none@1@S@SVMs take a different approach to avoiding overfitting by maximizing instead a margin.@@@@1@13@@danf@17-8-2009 10051540@unknown@formal@none@1@S@RBF networks are outperformed in most classification applications by SVMs.@@@@1@10@@danf@17-8-2009 10051550@unknown@formal@none@1@S@In regression applications they can be competitive when the dimensionality of the input space is relatively small.@@@@1@17@@danf@17-8-2009 10051560@unknown@formal@none@1@S@===Kohonen self-organizings network===@@@@1@3@@danf@17-8-2009 10051570@unknown@formal@none@1@S@The self-organizing map (SOM) invented by [[Teuvo Kohonen]] uses a form of [[unsupervised learning]].@@@@1@14@@danf@17-8-2009 10051580@unknown@formal@none@1@S@A set of artificial neurons learn to map points in an input space to coordinates in an output space.@@@@1@19@@danf@17-8-2009 10051590@unknown@formal@none@1@S@The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.@@@@1@21@@danf@17-8-2009 10051600@unknown@formal@none@1@S@===Recurrent network===@@@@1@2@@danf@17-8-2009 10051610@unknown@formal@none@1@S@Contrary to feedforward networks, [[recurrent neural network]]s (RNs) are models with bi-directional data flow.@@@@1@14@@danf@17-8-2009 10051620@unknown@formal@none@1@S@While a feedforward network propagates data linearly from input to output, RNs also propagate data from later processing stages to earlier stages.@@@@1@22@@danf@17-8-2009 10051630@unknown@formal@none@1@S@====Simple recurrent network====@@@@1@3@@danf@17-8-2009 10051640@unknown@formal@none@1@S@A ''simple recurrent network'' (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an "Elman network" due to its invention by [[Jeff Elman]].@@@@1@24@@danf@17-8-2009 10051650@unknown@formal@none@1@S@A three-layer network is used, with the addition of a set of "context units" in the input layer.@@@@1@18@@danf@17-8-2009 10051660@unknown@formal@none@1@S@There are connections from the middle (hidden) layer to these context units fixed with a weight of one.@@@@1@18@@danf@17-8-2009 10051670@unknown@formal@none@1@S@At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied.@@@@1@22@@danf@17-8-2009 10051680@unknown@formal@none@1@S@The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied).@@@@1@33@@danf@17-8-2009 10051690@unknown@formal@none@1@S@Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.@@@@1@27@@danf@17-8-2009 10051700@unknown@formal@none@1@S@In a ''fully recurrent network'', every neuron receives inputs from every other neuron in the network.@@@@1@16@@danf@17-8-2009 10051710@unknown@formal@none@1@S@These networks are not arranged in layers.@@@@1@7@@danf@17-8-2009 10051720@unknown@formal@none@1@S@Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons.@@@@1@39@@danf@17-8-2009 10051730@unknown@formal@none@1@S@These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.@@@@1@32@@danf@17-8-2009 10051740@unknown@formal@none@1@S@====Hopfield network====@@@@1@2@@danf@17-8-2009 10051750@unknown@formal@none@1@S@The [[Hopfield network]] is a recurrent neural network in which all connections are symmetric.@@@@1@14@@danf@17-8-2009 10051760@unknown@formal@none@1@S@Invented by [[John Hopfield]] in 1982, this network guarantees that its dynamics will converge.@@@@1@14@@danf@17-8-2009 10051770@unknown@formal@none@1@S@If the connections are trained using [[Hebbian learning]] then the Hopfield network can perform as robust content-addressable (or [[associative memory|associative]]) memory, resistant to connection alteration.@@@@1@25@@danf@17-8-2009 10051780@unknown@formal@none@1@S@====Echo state network====@@@@1@3@@danf@17-8-2009 10051790@unknown@formal@none@1@S@The [[echo state network]] (ESN) is a [[recurrent neural network]] with a sparsely connected random hidden layer.@@@@1@17@@danf@17-8-2009 10051800@unknown@formal@none@1@S@The weights of output neurons are the only part of the network that can change and be learned.@@@@1@18@@danf@17-8-2009 10051810@unknown@formal@none@1@S@ESN are good to (re)produce temporal patterns.@@@@1@7@@danf@17-8-2009 10051820@unknown@formal@none@1@S@====Long short term memory network====@@@@1@5@@danf@17-8-2009 10051830@unknown@formal@none@1@S@The [[Long short term memory]] is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients.@@@@1@22@@danf@17-8-2009 10051840@unknown@formal@none@1@S@It can therefore use long delays and can handle signals that have a mix of low and high frequency components.@@@@1@20@@danf@17-8-2009 10051850@unknown@formal@none@1@S@===Stochastic neural networks===@@@@1@3@@danf@17-8-2009 10051860@unknown@formal@none@1@S@A [[stochastic neural network]] differs from a typical neural network because it introduces random variations into the network.@@@@1@18@@danf@17-8-2009 10051870@unknown@formal@none@1@S@In a probabilistic view of neural networks, such random variations can be viewed as a form of [[statistical sampling]], such as [[Monte Carlo sampling]].@@@@1@24@@danf@17-8-2009 10051880@unknown@formal@none@1@S@====Boltzmann machine====@@@@1@2@@danf@17-8-2009 10051890@unknown@formal@none@1@S@The [[Boltzmann machine]] can be thought of as a noisy Hopfield network.@@@@1@12@@danf@17-8-2009 10051900@unknown@formal@none@1@S@Invented by [[Geoff Hinton]] and [[Terry Sejnowski]] in 1985, the Boltzmann machine is important because it is one of the first neural networks to demonstrate learning of latent variables (hidden units).@@@@1@31@@danf@17-8-2009 10051910@unknown@formal@none@1@S@Boltzmann machine learning was at first slow to simulate, but the [[contrastive divergence algorithm]] of Geoff Hinton (circa 2000) allows models such as Boltzmann machines and ''products of experts'' to be trained much faster.@@@@1@34@@danf@17-8-2009 10051920@unknown@formal@none@1@S@===Modular neural networks===@@@@1@3@@danf@17-8-2009 10051930@unknown@formal@none@1@S@Biological studies showed that the human brain functions not as a single massive network, but as a collection of small networks.@@@@1@21@@danf@17-8-2009 10051940@unknown@formal@none@1@S@This realisation gave birth to the concept of [[modular neural networks]], in which several small networks cooperate or compete to solve problems.@@@@1@22@@danf@17-8-2009 10051950@unknown@formal@none@1@S@====Committee of machines====@@@@1@3@@danf@17-8-2009 10051960@unknown@formal@none@1@S@A [[Committee machine|committee of machines]] (CoM) is a collection of different neural networks that together "vote" on a given example.@@@@1@20@@danf@17-8-2009 10051970@unknown@formal@none@1@S@This generally gives a much better result compared to other neural network models.@@@@1@13@@danf@17-8-2009 10051980@unknown@formal@none@1@S@In fact in many cases, starting with the same architecture and training but using different initial random weights gives vastly different networks.@@@@1@22@@danf@17-8-2009 10051990@unknown@formal@none@1@S@A CoM tends to stabilize the result.@@@@1@7@@danf@17-8-2009 10052000@unknown@formal@none@1@S@The CoM is similar to the general [[machine learning]] ''[[bootstrap Aggregating|bagging]]'' method, except that the necessary variety of machines in the committee is obtained by training from different random starting weights rather than training on different randomly selected subsets of the training data.@@@@1@43@@danf@17-8-2009 10052010@unknown@formal@none@1@S@====Associative neural network (ASNN)====@@@@1@4@@danf@17-8-2009 10052020@unknown@formal@none@1@S@The ASNN is an extension of the ''committee of machines'' that goes beyond a simple/weighted average of different models.@@@@1@19@@danf@17-8-2009 10052025@unknown@formal@none@1@S@[http://cogprints.soton.ac.uk/documents/disk0/00/00/14/41/index.html ASNN] represents a combination of an ensemble of feed-forward neural networks and the k-nearest neighbor technique ([[Nearest_neighbor_(pattern_recognition)|kNN]]).@@@@1@18@@danf@17-8-2009 10052030@unknown@formal@none@1@S@It uses the correlation between ensemble responses as a measure of '''distance''' amid the analyzed cases for the kNN.@@@@1@19@@danf@17-8-2009 10052040@unknown@formal@none@1@S@This corrects the bias of the neural network ensemble.@@@@1@9@@danf@17-8-2009 10052050@unknown@formal@none@1@S@An associative neural network has a memory that can coincide with the training set.@@@@1@14@@danf@17-8-2009 10052060@unknown@formal@none@1@S@If new data becomes available, the network instantly improves its predictive ability and provides data approximation (self-learn the data) without a need to retrain the ensemble.@@@@1@26@@danf@17-8-2009 10052070@unknown@formal@none@1@S@Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models.@@@@1@25@@danf@17-8-2009 10052080@unknown@formal@none@1@S@The method is demonstrated at [http://www.vcclab.org/lab/asnn www.vcclab.org], where you can either use it online or download it.@@@@1@17@@danf@17-8-2009 10052090@unknown@formal@none@1@S@===Other types of networks===@@@@1@4@@danf@17-8-2009 10052100@unknown@formal@none@1@S@These special networks do not fit in any of the previous categories.@@@@1@12@@danf@17-8-2009 10052110@unknown@formal@none@1@S@====Holographic associative memory====@@@@1@3@@danf@17-8-2009 10052120@unknown@formal@none@1@S@[[Holographic associative memory|''Holographic associative memory'']] represents a family of analog, correlation-based, associative, stimulus-response memories, where information is mapped onto the phase orientation of complex numbers operating.@@@@1@26@@danf@17-8-2009 10052130@unknown@formal@none@1@S@====Instantaneously trained networks====@@@@1@3@@danf@17-8-2009 10052140@unknown@formal@none@1@S@''[[Instantaneously trained neural networks]]'' (ITNNs) were inspired by the phenomenon of short-term learning that seems to occur instantaneously.@@@@1@18@@danf@17-8-2009 10052150@unknown@formal@none@1@S@In these networks the weights of the hidden and the output layers are mapped directly from the training vector data.@@@@1@20@@danf@17-8-2009 10052160@unknown@formal@none@1@S@Ordinarily, they work on binary data, but versions for continuous data that require small additional processing are also available.@@@@1@19@@danf@17-8-2009 10052170@unknown@formal@none@1@S@====Spiking neural networks====@@@@1@3@@danf@17-8-2009 10052180@unknown@formal@none@1@S@[[Spiking neural network]]s (SNNs) are models which explicitly take into account the timing of inputs.@@@@1@15@@danf@17-8-2009 10052190@unknown@formal@none@1@S@The network input and output are usually represented as series of spikes (delta function or more complex shapes).@@@@1@18@@danf@17-8-2009 10052200@unknown@formal@none@1@S@SNNs have an advantage of being able to process information in the [[time domain]] (signals that vary over time).@@@@1@19@@danf@17-8-2009 10052210@unknown@formal@none@1@S@They are often implemented as recurrent networks.@@@@1@7@@danf@17-8-2009 10052220@unknown@formal@none@1@S@SNNs are also a form of [[pulse computer]].@@@@1@8@@danf@17-8-2009 10052230@unknown@formal@none@1@S@Networks of spiking neurons — and the temporal correlations of neural assemblies in such networks — have been used to model figure/ground separation and region linking in the visual system (see e.g. Reitboeck et.al.in Haken and Stadler: Synergetics of the Brain.@@@@1@41@@danf@17-8-2009 10052240@unknown@formal@none@1@S@Berlin, 1989).@@@@1@2@@danf@17-8-2009 10052250@unknown@formal@none@1@S@Gerstner and Kistler have a freely available online textbook on [http://diwww.epfl.ch/~gerstner/BUCH.html Spiking Neuron Models].@@@@1@14@@danf@17-8-2009 10052260@unknown@formal@none@1@S@Spiking neural networks with axonal conduction delays exhibit polychronization, and hence could have a potentially unlimited memory capacity.@@@@1@18@@danf@17-8-2009 10052270@unknown@formal@none@1@S@In June 2005 [[IBM]] announced construction of a [[Blue Gene]] [[supercomputer]] dedicated to the simulation of a large recurrent spiking neural network [http://domino.research.ibm.com/comm/pr.nsf/pages/news.20050606_CognitiveIntelligence.html].@@@@1@23@@danf@17-8-2009 10052280@unknown@formal@none@1@S@====Dynamic neural networks====@@@@1@3@@danf@17-8-2009 10052290@unknown@formal@none@1@S@[[Dynamic neural network]]s not only deal with nonlinear multivariate behaviour, but also include (learning of) time-dependent behaviour such as various transient phenomena and delay effects.@@@@1@25@@danf@17-8-2009 10052300@unknown@formal@none@1@S@====Cascading neural networks====@@@@1@3@@danf@17-8-2009 10052310@unknown@formal@none@1@S@''Cascade-Correlation'' is an architecture and [[supervised learning]] [[algorithm]] developed by [[Scott Fahlman]] and [[Christian Lebiere]].@@@@1@15@@danf@17-8-2009 10052320@unknown@formal@none@1@S@Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure.@@@@1@33@@danf@17-8-2009 10052330@unknown@formal@none@1@S@Once a new hidden unit has been added to the network, its input-side weights are frozen.@@@@1@16@@danf@17-8-2009 10052340@unknown@formal@none@1@S@This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors.@@@@1@22@@danf@17-8-2009 10052350@unknown@formal@none@1@S@The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no [[back-propagation]] of error signals through the connections of the network.@@@@1@48@@danf@17-8-2009 10052360@unknown@formal@none@1@S@See: [[Cascade correlation algorithm]].@@@@1@4@@danf@17-8-2009 10052370@unknown@formal@none@1@S@====Neuro-fuzzy networks====@@@@1@2@@danf@17-8-2009 10052380@unknown@formal@none@1@S@A neuro-fuzzy network is a [[fuzzy inference system]] in the body of an artificial neural network.@@@@1@16@@danf@17-8-2009 10052390@unknown@formal@none@1@S@Depending on the ''FIS'' type, there are several layers that simulate the processes involved in a ''fuzzy inference'' like fuzzification, inference, aggregation and defuzzification.@@@@1@24@@danf@17-8-2009 10052400@unknown@formal@none@1@S@Embedding an ''FIS'' in a general structure of an ''ANN'' has the benefit of using available ''ANN'' training methods to find the parameters of a fuzzy system.@@@@1@27@@danf@17-8-2009 10052410@unknown@formal@none@1@S@====Holosemantic neural networks====@@@@1@3@@danf@17-8-2009 10052420@unknown@formal@none@1@S@The holosemantic neural network invented by Manfred Hoffleisch uses a kind a genetic algorithm to build a multidimensional structure.@@@@1@19@@danf@17-8-2009 10052430@unknown@formal@none@1@S@It takes into account the timing of inputs.@@@@1@8@@danf@17-8-2009 10052440@unknown@formal@none@1@S@====Compositional pattern-producing networks====@@@@1@3@@danf@17-8-2009 10052450@unknown@formal@none@1@S@[[Compositional pattern-producing network]]s (CPPNs) are a variation of ANNs which differ in their set of activation functions and how they are applied.@@@@1@22@@danf@17-8-2009 10052460@unknown@formal@none@1@S@While typical ANNs often contain only [[sigmoid function]]s (and sometimes [[Gaussian function]]s), CPPNs can include both types of functions and many others.@@@@1@22@@danf@17-8-2009 10052470@unknown@formal@none@1@S@Furthermore, unlike typical ANNs, CPPNs are applied across the entire space of possible inputs so that they can represent a complete image.@@@@1@22@@danf@17-8-2009 10052480@unknown@formal@none@1@S@Since they are compositions of functions, CPPNs in effect encode images at infinite resolution and can be sampled for a particular display at whatever resolution is optimal.@@@@1@27@@danf@17-8-2009 10052490@unknown@formal@none@1@S@==Theoretical properties==@@@@1@2@@danf@17-8-2009 10052500@unknown@formal@none@1@S@===Computational power===@@@@1@2@@danf@17-8-2009 10052510@unknown@formal@none@1@S@The multi-layer perceptron (MLP) is a universal function approximator, as proven by the [[Cybenko theorem]].@@@@1@15@@danf@17-8-2009 10052520@unknown@formal@none@1@S@However, the proof is not constructive regarding the number of neurons required or the settings of the weights.@@@@1@18@@danf@17-8-2009 10052530@unknown@formal@none@1@S@Work by [[Hava T. Siegelmann]] and [[Eduardo D. Sontag]] has provided a proof that a specific recurrent architecture with rational valued weights (as opposed to the commonly used floating point approximations) has the full power of a [[Universal Turing Machine]].@@@@1@40@@danf@17-8-2009 10052540@unknown@formal@none@1@S@They have further shown that the use of irrational values for weights results in a machine with trans-Turing power.@@@@1@19@@danf@17-8-2009 10052550@unknown@formal@none@1@S@===Capacity===@@@@1@1@@danf@17-8-2009 10052560@unknown@formal@none@1@S@Artificial neural network models have a property called 'capacity', which roughly corresponds to their ability to model any given function.@@@@1@20@@danf@17-8-2009 10052570@unknown@formal@none@1@S@It is related to the amount of information that can be stored in the network and to the notion of complexity.@@@@1@21@@danf@17-8-2009 10052580@unknown@formal@none@1@S@===Convergence===@@@@1@1@@danf@17-8-2009 10052590@unknown@formal@none@1@S@Nothing can be said in general about convergence since it depends on a number of factors.@@@@1@16@@danf@17-8-2009 10052600@unknown@formal@none@1@S@Firstly, there may exist many local minima.@@@@1@7@@danf@17-8-2009 10052610@unknown@formal@none@1@S@This depends on the cost function and the model.@@@@1@9@@danf@17-8-2009 10052620@unknown@formal@none@1@S@Secondly, the optimization method used might not be guaranteed to converge when far away from a local minimum.@@@@1@18@@danf@17-8-2009 10052630@unknown@formal@none@1@S@Thirdly, for a very large amount of data or parameters, some methods become impractical.@@@@1@14@@danf@17-8-2009 10052640@unknown@formal@none@1@S@In general, it has been found that theoretical guarantees regarding convergence are not always a very reliable guide to practical application.@@@@1@21@@danf@17-8-2009 10052650@unknown@formal@none@1@S@===Generalisation and statistics===@@@@1@3@@danf@17-8-2009 10052660@unknown@formal@none@1@S@In applications where the goal is to create a system that generalises well in unseen examples, the problem of overtraining has emerged.@@@@1@22@@danf@17-8-2009 10052670@unknown@formal@none@1@S@This arises in overcomplex or overspecified systems when the capacity of the network significantly exceeds the needed free parameters.@@@@1@19@@danf@17-8-2009 10052680@unknown@formal@none@1@S@There are two schools of thought for avoiding this problem: The first is to use cross-validation and similar techniques to check for the presence of overtraining and optimally select hyperparameters such as to minimise the generalisation error.@@@@1@37@@danf@17-8-2009 10052690@unknown@formal@none@1@S@The second is to use some form of ''regularisation''.@@@@1@9@@danf@17-8-2009 10052700@unknown@formal@none@1@S@This is a concept that emerges naturally in a probabilistic (Bayesian) framework, where the regularisation can be performed by putting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimise over two quantities: the 'empirical risk' and the 'structural risk', which roughly correspond to the error over the training set and the predicted error in unseen data due to overfitting.@@@@1@69@@danf@17-8-2009 10052710@unknown@formal@none@1@S@Supervised neural networks that use an [[Mean squared error|MSE]] cost function can use formal statistical methods to determine the confidence of the trained model.@@@@1@24@@danf@17-8-2009 10052720@unknown@formal@none@1@S@The MSE on a validation set can be used as an estimate for variance.@@@@1@14@@danf@17-8-2009 10052730@unknown@formal@none@1@S@This value can then be used to calculate the [[confidence interval]] of the output of the network, assuming a [[normal distribution]].@@@@1@21@@danf@17-8-2009 10052740@unknown@formal@none@1@S@A confidence analysis made this way is statistically valid as long as the output [[probability distribution]] stays the same and the network is not modified.@@@@1@25@@danf@17-8-2009 10052750@unknown@formal@none@1@S@By assigning a softmax activation function on the output layer of the neural network (or a softmax component in a component-based neural network) for categorical target variables, the outputs can be interpreted as posterior probabilities.@@@@1@35@@danf@17-8-2009 10052760@unknown@formal@none@1@S@This is very useful in classification as it gives a certainty measure on classifications.@@@@1@14@@danf@17-8-2009 10052770@unknown@formal@none@1@S@The softmax activation function: y_i=\\frac{e^{x_i}}{\\sum_{j=1}^c e^{x_j}}@@@@1@6@@danf@17-8-2009 10052780@unknown@formal@none@1@S@===Dynamic properties===@@@@1@2@@danf@17-8-2009 10052790@unknown@formal@none@1@S@Various techniques originally developed for studying disordered magnetic systems (i.e. the [[spin glass]]) have been successfully applied to simple neural network architectures, such as the Hopfield network.@@@@1@27@@danf@17-8-2009 10052800@unknown@formal@none@1@S@Influential work by E. Gardner and B. Derrida has revealed many interesting properties about perceptrons with real-valued synaptic weights, while later work by W. Krauth and M. Mezard has extended these principles to binary-valued synapses.@@@@1@35@@danf@17-8-2009 10060010@unknown@formal@none@1@S@
Association for Computational Linguistics
@@@@1@4@@danf@17-8-2009 10060020@unknown@formal@none@1@S@The '''Association for Computational Linguistics''' ('''ACL''') is the international scientific and professional society for people working on problems involving [[natural language and computation]].@@@@1@23@@danf@17-8-2009 10060030@unknown@formal@none@1@S@An annual meeting is held each summer in locations where significant [[computational linguistics]] research is carried out.@@@@1@17@@danf@17-8-2009 10060040@unknown@formal@none@1@S@It was founded in [[1962]], originally named the '''Association for Machine Translation and Computational Linguistics''' ('''AMTCL''').@@@@1@16@@danf@17-8-2009 10060050@unknown@formal@none@1@S@It became the ACL in [[1968]].@@@@1@6@@danf@17-8-2009 10060060@unknown@formal@none@1@S@The ACL has [[Europe]]an and [[North America]]n chapters, the [[European Chapter of the Association for Computational Linguistics]] (EACL) and the [[North American Chapter of the Association for Computational Linguistics]] (NAACL).@@@@1@30@@danf@17-8-2009 10060070@unknown@formal@none@1@S@The ACL journal, [[Computational Linguistics (journal)|''Computational Linguistics'']], continues to be the primary forum for research on computational linguistics and [[natural language processing]].@@@@1@22@@danf@17-8-2009 10060080@unknown@formal@none@1@S@Since [[1988]], the journal has been published for the ACL by [[MIT Press]].@@@@1@13@@danf@17-8-2009 10060090@unknown@formal@none@1@S@The ACL book series, [[Studies_in_NLP|''Studies in Natural Language Processing'']], is published by [[Cambridge University Press]].@@@@1@15@@danf@17-8-2009 10060100@unknown@formal@none@1@S@==Special Interest Groups==@@@@1@3@@danf@17-8-2009 10060110@unknown@formal@none@1@S@ACL has a large number of Special Interest Groups (SIGs), focusing on specific areas of natural language processing.@@@@1@18@@danf@17-8-2009 10060120@unknown@formal@none@1@S@Some current SIGs within ACL are:@@@@1@6@@danf@17-8-2009 10060130@unknown@formal@none@1@S@*Linguistic data and corpus-based approaches: [http://www.aclweb.org/sigdat SIGDAT]@@@@1@7@@danf@17-8-2009 10060140@unknown@formal@none@1@S@*Dialogue Processing: [http://www.aclweb.org/sigdial SIGDIAL]@@@@1@4@@danf@17-8-2009 10060150@unknown@formal@none@1@S@*Natural Language Generation: [http://www.siggen.org SIGGEN]@@@@1@5@@danf@17-8-2009 10060160@unknown@formal@none@1@S@*Lexicon: [http://www.aclweb.org/siglex SIGLEX]@@@@1@3@@danf@17-8-2009 10060170@unknown@formal@none@1@S@*Mathematics of Language: [http://molweb.org SIGMOL]@@@@1@5@@danf@17-8-2009 10060180@unknown@formal@none@1@S@*Computational Morphology and Phonology: [http://salad.cs.swarthmore.edu/sigphon SIGMORPHON]@@@@1@6@@danf@17-8-2009 10060190@unknown@formal@none@1@S@*Computational Semantics: [http://www.aclweb.org/sigsem/ SIGSEM]@@@@1@4@@danf@17-8-2009 10070010@unknown@formal@none@1@S@
Babel Fish (website)
@@@@1@3@@danf@17-8-2009 10070020@unknown@formal@none@1@S@'''Babel Fish''' is a [[World Wide Web|web]]-based application on [[Yahoo!]] that [[machine translation|machine translates]] text or web pages from one of several languages into another.@@@@1@25@@danf@17-8-2009 10070030@unknown@formal@none@1@S@Developed by [[AltaVista]], the application is named after the fictional animal used for instantaneous [[language]] [[translation]] in [[Douglas Adams]]'s series ''[[The Hitchhiker's Guide to the Galaxy]].''@@@@1@26@@danf@17-8-2009 10070040@unknown@formal@none@1@S@In turn the fish is a reference to the [[biblical]] account of the city of [[Babel]] and the [[Confusion of tongues|various languages]] said to have arisen there.@@@@1@27@@danf@17-8-2009 10070050@unknown@formal@none@1@S@The translation technology for Babel Fish is provided by [[SYSTRAN]], whose technology also powers a number of other sites and portals.@@@@1@21@@danf@17-8-2009 10070060@unknown@formal@none@1@S@It translates among [[English language|English]], [[Simplified Chinese]], [[Traditional Chinese]], [[Dutch language|Dutch]], [[French language|French]], [[German language|German]], [[Greek language|Greek]], [[Italian language|Italian]], [[Japanese language|Japanese]], [[Korean language|Korean]], [[Portuguese language|Portuguese]], [[Russian language|Russian]], and [[Spanish language|Spanish]].@@@@1@30@@danf@17-8-2009 10070070@unknown@formal@none@1@S@The service makes no claim to produce a perfect translation.@@@@1@10@@danf@17-8-2009 10070080@unknown@formal@none@1@S@A number of humour sites have sprung up that use the Babel Fish service to translate back and forth between one or more languages (a so-called [[round-trip translation]]).@@@@1@28@@danf@17-8-2009 10070090@unknown@formal@none@1@S@After a long existence at babelfish.altavista.com, the site was moved on May 9 2008 to babelfish.yahoo.com.@@@@1@16@@danf@17-8-2009 10080010@unknown@formal@none@1@S@
Bioinformatics
@@@@1@1@@danf@17-8-2009 10080020@unknown@formal@none@1@S@'''Bioinformatics''' and '''computational biology''' involve the use of techniques including [[applied mathematics]], [[informatics]], [[statistics]], [[computer science]], [[artificial intelligence]], [[chemistry]], and [[biochemistry]] to solve [[biology|biological]] problems usually on the [[molecular]] level.@@@@1@30@@danf@17-8-2009 10080030@unknown@formal@none@1@S@The core principle of these techniques is using computing resources in order to solve problems on scales of magnitude far too great for human discernment.@@@@1@25@@danf@17-8-2009 10080040@unknown@formal@none@1@S@Research in computational biology often overlaps with [[systems biology]].@@@@1@9@@danf@17-8-2009 10080050@unknown@formal@none@1@S@Major research efforts in the field include [[sequence alignment]], [[gene finding]], [[genome assembly]], [[protein structural alignment|protein structure alignment]], [[protein structure prediction]], prediction of [[gene expression]] and [[protein-protein interactions]], and the modeling of [[evolution]].@@@@1@33@@danf@17-8-2009 10080060@unknown@formal@none@1@S@==Introduction==@@@@1@1@@danf@17-8-2009 10080070@unknown@formal@none@1@S@The terms '''''bioinformatics''''' and ''[[computational biology]]'' are often used interchangeably.@@@@1@10@@danf@17-8-2009 10080080@unknown@formal@none@1@S@However ''bioinformatics'' more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.@@@@1@33@@danf@17-8-2009 10080090@unknown@formal@none@1@S@Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental or simulated data, with the primary goal of discovery and the advancement of biological knowledge.@@@@1@36@@danf@17-8-2009 10080100@unknown@formal@none@1@S@Put more simply, bioinformatics is concerned with the information while computational biology is concerned with the hypotheses.@@@@1@17@@danf@17-8-2009 10080110@unknown@formal@none@1@S@A similar distinction is made by [[NIH|National Institutes of Health]] in their [http://www.bisti.nih.gov/CompuBioDef.pdf working definitions of Bioinformatics and Computational Biology], where it is further emphasized that there is a tight coupling of developments and knowledge between the more hypothesis-driven research in computational biology and technique-driven research in bioinformatics.@@@@1@48@@danf@17-8-2009 10080120@unknown@formal@none@1@S@Bioinformatics is also often specified as an applied subfield of the more general discipline of [[Biomedical informatics]].@@@@1@17@@danf@17-8-2009 10080130@unknown@formal@none@1@S@A common thread in projects in bioinformatics and computational biology is the use of mathematical tools to extract useful information from data produced by high-throughput biological techniques such as [[genome sequencing]].@@@@1@31@@danf@17-8-2009 10080140@unknown@formal@none@1@S@A representative problem in bioinformatics is the assembly of high-quality genome sequences from fragmentary "shotgun" DNA [[sequencing]].@@@@1@17@@danf@17-8-2009 10080150@unknown@formal@none@1@S@Other common problems include the study of [[gene regulation]] to perform [[expression profiling]] using data from [[DNA microarray|microarrays]] or [[mass spectrometry]].@@@@1@21@@danf@17-8-2009 10080160@unknown@formal@none@1@S@==Major research areas==@@@@1@3@@danf@17-8-2009 10080170@unknown@formal@none@1@S@===Sequence analysis===@@@@1@2@@danf@17-8-2009 10080180@unknown@formal@none@1@S@Since the [[Phi-X174 phage|Phage Φ-X174]] was [[sequencing|sequenced]] in 1977, the [[DNA sequence]]s of hundreds of organisms have been decoded and stored in databases.@@@@1@23@@danf@17-8-2009 10080190@unknown@formal@none@1@S@The information is analyzed to determine genes that encode [[polypeptides]], as well as regulatory sequences.@@@@1@15@@danf@17-8-2009 10080200@unknown@formal@none@1@S@A comparison of genes within a [[species]] or between different species can show similarities between protein functions, or relations between species (the use of [[molecular systematics]] to construct [[phylogenetic tree]]s).@@@@1@30@@danf@17-8-2009 10080210@unknown@formal@none@1@S@With the growing amount of data, it long ago became impractical to analyze DNA sequences manually.@@@@1@16@@danf@17-8-2009 10080220@unknown@formal@none@1@S@Today, [[computer program]]s are used to search the [[genome]] of thousands of organisms, containing billions of [[nucleotide]]s.@@@@1@17@@danf@17-8-2009 10080230@unknown@formal@none@1@S@These programs would compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical.@@@@1@26@@danf@17-8-2009 10080240@unknown@formal@none@1@S@A variant of this [[sequence alignment]] is used in the sequencing process itself.@@@@1@13@@danf@17-8-2009 10080250@unknown@formal@none@1@S@The so-called [[shotgun sequencing]] technique (which was used, for example, by [[The Institute for Genomic Research]] to sequence the first bacterial genome, ''Haemophilus influenzae'') does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600-800 nucleotides long).@@@@1@47@@danf@17-8-2009 10080260@unknown@formal@none@1@S@The ends of these fragments overlap and, when aligned in the right way, make up the complete genome.@@@@1@18@@danf@17-8-2009 10080270@unknown@formal@none@1@S@Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes.@@@@1@20@@danf@17-8-2009 10080280@unknown@formal@none@1@S@In the case of the [[Human Genome Project]], it took several months of CPU time (on a circa-2000 vintage [[DEC Alpha]] computer) to assemble the fragments.@@@@1@26@@danf@17-8-2009 10080290@unknown@formal@none@1@S@Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research.@@@@1@24@@danf@17-8-2009 10080300@unknown@formal@none@1@S@Another aspect of bioinformatics in sequence analysis is the automatic [[gene finding|search for genes]] and regulatory sequences within a genome.@@@@1@20@@danf@17-8-2009 10080310@unknown@formal@none@1@S@Not all of the nucleotides within a genome are genes.@@@@1@10@@danf@17-8-2009 10080320@unknown@formal@none@1@S@Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose.@@@@1@17@@danf@17-8-2009 10080330@unknown@formal@none@1@S@This so-called [[junk DNA]] may, however, contain unrecognized functional elements.@@@@1@10@@danf@17-8-2009 10080340@unknown@formal@none@1@S@Bioinformatics helps to bridge the gap between genome and [[proteome]] projects--for example, in the use of DNA sequences for protein identification.@@@@1@21@@danf@17-8-2009 10080350@unknown@formal@none@1@S@''See also:'' [[sequence analysis]], [[sequence profiling tool]], [[sequence motif]].@@@@1@9@@danf@17-8-2009 10080360@unknown@formal@none@1@S@===Genome annotation===@@@@1@2@@danf@17-8-2009 10080370@unknown@formal@none@1@S@In the context of [[genomics]], '''annotation''' is the process of marking the genes and other biological features in a DNA sequence.@@@@1@21@@danf@17-8-2009 10080380@unknown@formal@none@1@S@The first genome annotation software system was designed in 1995 by Dr. Owen White, who was part of the team that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium ''[[Haemophilus influenzae]]''.@@@@1@38@@danf@17-8-2009 10080390@unknown@formal@none@1@S@Dr. White built a software system to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features, and to make initial assignments of function to those genes.@@@@1@35@@danf@17-8-2009 10080400@unknown@formal@none@1@S@Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.@@@@1@21@@danf@17-8-2009 10080410@unknown@formal@none@1@S@===Computational evolutionary biology===@@@@1@3@@danf@17-8-2009 10080420@unknown@formal@none@1@S@[[Evolutionary biology]] is the study of the origin and descent of [[species]], as well as their change over time.@@@@1@19@@danf@17-8-2009 10080430@unknown@formal@none@1@S@Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to:@@@@1@14@@danf@17-8-2009 10080440@unknown@formal@none@1@S@*trace the evolution of a large number of organisms by measuring changes in their [[DNA]], rather than through [[physical taxonomy]] or physiological observations alone,@@@@1@24@@danf@17-8-2009 10080450@unknown@formal@none@1@S@*more recently, compare entire [[genomes]], which permits the study of more complex evolutionary events, such as [[gene duplication]], [[lateral gene transfer]], and the prediction of factors important in bacterial [[speciation]],@@@@1@30@@danf@17-8-2009 10080460@unknown@formal@none@1@S@*build complex computational models of populations to predict the outcome of the system over time@@@@1@15@@danf@17-8-2009 10080470@unknown@formal@none@1@S@*track and share information on an increasingly large number of species and organisms@@@@1@13@@danf@17-8-2009 10080480@unknown@formal@none@1@S@Future work endeavours to reconstruct the now more complex [[Evolutionary tree|tree of life]].@@@@1@13@@danf@17-8-2009 10080490@unknown@formal@none@1@S@The area of research within [[computer science]] that uses [[genetic algorithm]]s is sometimes confused with [[computational evolutionary biology]], but the two areas are unrelated.@@@@1@24@@danf@17-8-2009 10080500@unknown@formal@none@1@S@===Measuring biodiversity===@@@@1@2@@danf@17-8-2009 10080510@unknown@formal@none@1@S@[[Biodiversity]] of an ecosystem might be defined as the total genomic complement of a particular environment, from all of the species present, whether it is a biofilm in an abandoned mine, a drop of sea water, a scoop of soil, or the entire [[biosphere]] of the planet [[Earth]].@@@@1@48@@danf@17-8-2009 10080520@unknown@formal@none@1@S@Databases are used to collect the [[species]] names, descriptions, distributions, genetic information, status and size of [[population]]s, [[Habitat (ecology)|habitat]] needs, and how each organism interacts with other species.@@@@1@28@@danf@17-8-2009 10080530@unknown@formal@none@1@S@Specialized [[computer software|software]] programs are used to find, visualize, and analyze the information, and most importantly, communicate it to other people.@@@@1@21@@danf@17-8-2009 10080540@unknown@formal@none@1@S@Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in [[agriculture]]) or endangered population (in [[conservation ecology|conservation]]).@@@@1@26@@danf@17-8-2009 10080550@unknown@formal@none@1@S@One very exciting potential of this field is that entire [[DNA]] sequences, or [[genome]]s of [[endangered species]] can be preserved, allowing the results of Nature's genetic experiment to be remembered ''[[in silico]]'', and possibly reused in the future, even if that species is eventually lost.@@@@1@45@@danf@17-8-2009 10080560@unknown@formal@none@1@S@''Important projects:'' [http://www.sp2000.org/ Species 2000 project]; [http://www.ubio.org/ uBio Project].@@@@1@9@@danf@17-8-2009 10080570@unknown@formal@none@1@S@===Analysis of gene expression===@@@@1@4@@danf@17-8-2009 10080580@unknown@formal@none@1@S@The [[gene expression|expression]] of many genes can be determined by measuring [[mRNA]] levels with multiple techniques including [[DNA microarray|microarrays]], [[expressed sequence tag|expressed cDNA sequence tag]] (EST) sequencing, [[serial analysis of gene expression]] (SAGE) tag sequencing, [[massively parallel signature sequencing]] (MPSS), or various applications of multiplexed in-situ hybridization.@@@@1@47@@danf@17-8-2009 10080590@unknown@formal@none@1@S@All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate [[signal (information theory)|signal]] from [[noise]] in high-throughput gene expression studies.@@@@1@39@@danf@17-8-2009 10080600@unknown@formal@none@1@S@Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous [[epithelial]] cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells.@@@@1@43@@danf@17-8-2009 10080610@unknown@formal@none@1@S@===Analysis of regulation===@@@@1@3@@danf@17-8-2009 10080620@unknown@formal@none@1@S@Regulation is the complex orchestration of events starting with an extracellular signal such as a [[hormone]] and leading to an increase or decrease in the activity of one or more [[protein]]s.@@@@1@31@@danf@17-8-2009 10080630@unknown@formal@none@1@S@Bioinformatics techniques have been applied to explore various steps in this process.@@@@1@12@@danf@17-8-2009 10080640@unknown@formal@none@1@S@For example, [[promoter analysis]] involves the identification and study of [[sequence motif]]s in the DNA surrounding the coding region of a gene.@@@@1@22@@danf@17-8-2009 10080650@unknown@formal@none@1@S@These motifs influence the extent to which that region is transcribed into mRNA.@@@@1@13@@danf@17-8-2009 10080660@unknown@formal@none@1@S@Expression data can be used to infer gene regulation: one might compare [[microarray]] data from a wide variety of states of an organism to form hypotheses about the genes involved in each state.@@@@1@33@@danf@17-8-2009 10080670@unknown@formal@none@1@S@In a single-cell organism, one might compare stages of the [[cell cycle]], along with various stress conditions (heat shock, starvation, etc.).@@@@1@21@@danf@17-8-2009 10080680@unknown@formal@none@1@S@One can then apply [[cluster analysis|clustering algorithms]] to that expression data to determine which genes are co-expressed.@@@@1@17@@danf@17-8-2009 10080690@unknown@formal@none@1@S@For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented [[regulatory elements]].@@@@1@16@@danf@17-8-2009 10080700@unknown@formal@none@1@S@===Analysis of protein expression===@@@@1@4@@danf@17-8-2009 10080710@unknown@formal@none@1@S@[[Protein microarray]]s and high throughput (HT) [[mass spectrometry]] (MS) can provide a snapshot of the proteins present in a biological sample.@@@@1@21@@danf@17-8-2009 10080720@unknown@formal@none@1@S@Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected.@@@@1@63@@danf@17-8-2009 10080730@unknown@formal@none@1@S@===Analysis of mutations in cancer===@@@@1@5@@danf@17-8-2009 10080740@unknown@formal@none@1@S@In cancer, the genomes of affected cells are rearranged in complex or even unpredictable ways.@@@@1@15@@danf@17-8-2009 10080750@unknown@formal@none@1@S@Massive sequencing efforts are used to identify previously unknown [[point mutation]]s in a variety of [[gene]]s in [[cancer]].@@@@1@18@@danf@17-8-2009 10080760@unknown@formal@none@1@S@Bioinformaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of [[human genome]] sequences and [[germline]] polymorphisms.@@@@1@39@@danf@17-8-2009 10080770@unknown@formal@none@1@S@New physical detection technology are employed, such as [[oligonucleotide]] microarrays to identify chromosomal gains and losses (called [[comparative genomic hybridization]]), and [[single nucleotide polymorphism]] arrays to detect known ''point mutations''.@@@@1@30@@danf@17-8-2009 10080780@unknown@formal@none@1@S@These detection methods simultaneously measure several hundred thousand sites throughout the genome, and when used in high-throughput to measure thousands of samples, generate [[terabyte]]s of data per experiment.@@@@1@28@@danf@17-8-2009 10080790@unknown@formal@none@1@S@Again the massive amounts and new types of data generate new opportunities for bioinformaticians.@@@@1@14@@danf@17-8-2009 10080800@unknown@formal@none@1@S@The data is often found to contain considerable variability, or [[noise]], and thus [[Hidden Markov model]] and [[change-point analysis]] methods are being developed to infer real [[copy number variation|copy number]] changes.@@@@1@31@@danf@17-8-2009 10080810@unknown@formal@none@1@S@Another type of data that requires novel informatics development is the analysis of lesions found to be recurrent among many tumors .@@@@1@22@@danf@17-8-2009 10080820@unknown@formal@none@1@S@===Prediction of protein structure===@@@@1@4@@danf@17-8-2009 10080830@unknown@formal@none@1@S@Protein structure prediction is another important application of bioinformatics.@@@@1@9@@danf@17-8-2009 10080840@unknown@formal@none@1@S@The [[amino acid]] sequence of a protein, the so-called [[primary structure]], can be easily determined from the sequence on the gene that codes for it.@@@@1@25@@danf@17-8-2009 10080850@unknown@formal@none@1@S@In the vast majority of cases, this primary structure uniquely determines a structure in its native environment.@@@@1@17@@danf@17-8-2009 10080860@unknown@formal@none@1@S@(Of course, there are exceptions, such as the [[bovine spongiform encephalopathy]] - aka [[Mad Cow Disease]] - [[prion]].)@@@@1@18@@danf@17-8-2009 10080870@unknown@formal@none@1@S@Knowledge of this structure is vital in understanding the function of the protein.@@@@1@13@@danf@17-8-2009 10080880@unknown@formal@none@1@S@For lack of better terms, structural information is usually classified as one of ''[[secondary structure|secondary]]'', ''[[tertiary structure|tertiary]]'' and ''[[quaternary structure|quaternary]]'' structure.@@@@1@21@@danf@17-8-2009 10080890@unknown@formal@none@1@S@A viable general solution to such predictions remains an open problem.@@@@1@11@@danf@17-8-2009 10080900@unknown@formal@none@1@S@As of now, most efforts have been directed towards heuristics that work most of the time.@@@@1@16@@danf@17-8-2009 10080910@unknown@formal@none@1@S@One of the key ideas in bioinformatics is the notion of [[homology (biology)|homology]].@@@@1@13@@danf@17-8-2009 10080920@unknown@formal@none@1@S@In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene ''A'', whose function is known, is homologous to the sequence of gene ''B,'' whose function is unknown, one could infer that B may share A's function.@@@@1@47@@danf@17-8-2009 10080930@unknown@formal@none@1@S@In the structural branch of bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins.@@@@1@26@@danf@17-8-2009 10080940@unknown@formal@none@1@S@In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known.@@@@1@26@@danf@17-8-2009 10080950@unknown@formal@none@1@S@This currently remains the only way to predict protein structures reliably.@@@@1@11@@danf@17-8-2009 10080960@unknown@formal@none@1@S@One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes ([[leghemoglobin]]).@@@@1@19@@danf@17-8-2009 10080970@unknown@formal@none@1@S@Both serve the same purpose of transporting oxygen in the organism.@@@@1@11@@danf@17-8-2009 10080980@unknown@formal@none@1@S@Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes.@@@@1@23@@danf@17-8-2009 10080990@unknown@formal@none@1@S@Other techniques for predicting protein structure include protein threading and ''de novo'' (from scratch) physics-based modeling.@@@@1@16@@danf@17-8-2009 10081000@unknown@formal@none@1@S@''See also:'' [[structural motif]] and [[structural domain]].@@@@1@7@@danf@17-8-2009 10081010@unknown@formal@none@1@S@=== Comparative genomics ===@@@@1@4@@danf@17-8-2009 10081020@unknown@formal@none@1@S@The core of comparative genome analysis is the establishment of the correspondence between [[genes]] (orthology analysis) or other genomic features in different organisms.@@@@1@23@@danf@17-8-2009 10081030@unknown@formal@none@1@S@It is these intergenomic maps that make it possible to trace the evolutionary processes responsible for the divergence of two genomes.@@@@1@21@@danf@17-8-2009 10081040@unknown@formal@none@1@S@A multitude of evolutionary events acting at various organizational levels shape genome evolution.@@@@1@13@@danf@17-8-2009 10081050@unknown@formal@none@1@S@At the lowest level, point mutations affect individual nucleotides.@@@@1@9@@danf@17-8-2009 10081060@unknown@formal@none@1@S@At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion.@@@@1@16@@danf@17-8-2009 10081070@unknown@formal@none@1@S@Ultimately, whole genomes are involved in processes of hybridization, polyploidization and [[endosymbiosis]], often leading to rapid speciation.@@@@1@17@@danf@17-8-2009 10081080@unknown@formal@none@1@S@The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms, who have recourse to a spectra of algorithmic, statistical and mathematical techniques, ranging from exact, [[heuristics]], fixed parameter and [[approximation algorithms]] for problems based on parsimony models to [[Markov Chain Monte Carlo]] algorithms for [[Bayesian analysis]] of problems based on probabilistic models.@@@@1@58@@danf@17-8-2009 10081090@unknown@formal@none@1@S@Many of these studies are based on the homology detection and protein families computation.@@@@1@14@@danf@17-8-2009 10081100@unknown@formal@none@1@S@===Modeling biological systems===@@@@1@3@@danf@17-8-2009 10081110@unknown@formal@none@1@S@Systems biology involves the use of [[computer simulation]]s of [[cell (biology)|cellular]] subsystems (such as the [[metabolic network|networks of metabolites]] and [[enzyme]]s which comprise [[metabolism]], [[signal transduction]] pathways and [[gene regulatory network]]s) to both analyze and visualize the complex connections of these cellular processes.@@@@1@43@@danf@17-8-2009 10081120@unknown@formal@none@1@S@[[Artificial life]] or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.@@@@1@19@@danf@17-8-2009 10081130@unknown@formal@none@1@S@===High-throughput image analysis===@@@@1@3@@danf@17-8-2009 10081140@unknown@formal@none@1@S@Computational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content [[biomedical imagery]].@@@@1@21@@danf@17-8-2009 10081150@unknown@formal@none@1@S@Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving [[accuracy]], [[Objectivity (science)|objectivity]], or speed.@@@@1@26@@danf@17-8-2009 10081160@unknown@formal@none@1@S@A fully developed analysis system may completely replace the observer.@@@@1@10@@danf@17-8-2009 10081170@unknown@formal@none@1@S@Although these systems are not unique to biomedical imagery, biomedical imaging is becoming more important for both [[diagnostics]] and research.@@@@1@20@@danf@17-8-2009 10081180@unknown@formal@none@1@S@Some examples are:@@@@1@3@@danf@17-8-2009 10081190@unknown@formal@none@1@S@* high-throughput and high-fidelity quantification and sub-cellular localization ([[high-content screening]], [[cytohistopathology]])@@@@1@11@@danf@17-8-2009 10081200@unknown@formal@none@1@S@* [[morphometrics]]@@@@1@2@@danf@17-8-2009 10081210@unknown@formal@none@1@S@* clinical image analysis and visualization@@@@1@6@@danf@17-8-2009 10081220@unknown@formal@none@1@S@* determining the real-time air-flow patterns in breathing lungs of living animals@@@@1@12@@danf@17-8-2009 10081230@unknown@formal@none@1@S@* quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury@@@@1@16@@danf@17-8-2009 10081240@unknown@formal@none@1@S@* making behavioral observations from extended video recordings of laboratory animals@@@@1@11@@danf@17-8-2009 10081250@unknown@formal@none@1@S@* infrared measurements for metabolic activity determination@@@@1@7@@danf@17-8-2009 10081260@unknown@formal@none@1@S@===Protein-protein docking===@@@@1@2@@danf@17-8-2009 10081270@unknown@formal@none@1@S@In the last two decades, tens of thousands of protein three-dimensional structures have been determined by [[X-ray crystallography]] and [[Protein nuclear magnetic resonance spectroscopy]] (protein NMR).@@@@1@26@@danf@17-8-2009 10081280@unknown@formal@none@1@S@One central question for the biological scientist is whether it is practical to predict possible protein-protein interactions only based on these 3D shapes, without doing [[protein-protein interaction]] experiments.@@@@1@28@@danf@17-8-2009 10081290@unknown@formal@none@1@S@A variety of methods have been developed to tackle the [[Protein-protein docking]] problem, though it seems that there is still much place to work on in this field.@@@@1@28@@danf@17-8-2009 10081300@unknown@formal@none@1@S@===Software and Tools===@@@@1@3@@danf@17-8-2009 10081310@unknown@formal@none@1@S@Software tools for bioinformatics range from simple command-line tools, to more complex graphical programs and standalone web-services.@@@@1@17@@danf@17-8-2009 10081320@unknown@formal@none@1@S@The computational biology tool best-known among biologists is probably [[BLAST]], an algorithm for determining the similarity of arbitrary sequences against other sequences, possibly from curated databases of protein or DNA sequences.@@@@1@31@@danf@17-8-2009 10081330@unknown@formal@none@1@S@The [[National Center for Biotechnology Information|NCBI]] provides a popular web-based implementation that searches their databases.@@@@1@15@@danf@17-8-2009 10081340@unknown@formal@none@1@S@BLAST is one of a number of [[List of sequence alignment software|generally available programs]] for doing sequence alignment.@@@@1@18@@danf@17-8-2009 10081350@unknown@formal@none@1@S@===Web Services in Bioinformatics===@@@@1@4@@danf@17-8-2009 10081360@unknown@formal@none@1@S@[[SOAP]] and [[REST]]-based interfaces have been developed for a wide variety of bioinformatics applications allowing an application running on one computer in one part of the world to use algorithms, data and computing resources on servers in other parts of the world.@@@@1@42@@danf@17-8-2009 10081370@unknown@formal@none@1@S@The main advantages lay in the end user not having to deal with software and database maintenance overheads.@@@@1@18@@danf@17-8-2009 10081375@unknown@formal@none@1@S@Basic bioinformatics services are classified by the [[EBI]] into three categories: [[Sequence alignment software|SSS]] (Sequence Search Services), [[Multiple sequence alignment|MSA]] (Multiple Sequence Alignment) and [[Bioinformatics#Sequence_analysis|BSA]] (Biological Sequence Analysis).@@@@1@28@@danf@17-8-2009 10081380@unknown@formal@none@1@S@The availability of these [[service-oriented]] bioinformatics resources demonstrate the applicability of web based bioinformatics solutions, and range from a collection of standalone tools with a common data format under a single, standalone or web-based interface, to integrative, distributed and extensible [[bioinformatics workflow management systems]].@@@@1@44@@danf@17-8-2009 10090010@unknown@formal@none@1@S@
BLEU
@@@@1@1@@danf@17-8-2009 10090020@unknown@formal@none@1@S@:''This page is about the evaluation metric for machine translation.@@@@1@10@@danf@17-8-2009 10090030@unknown@formal@none@1@S@For other meanings, please see [[Bleu]].''@@@@1@6@@danf@17-8-2009 10090040@unknown@formal@none@1@S@'''BLEU''' ('''Bilingual Evaluation Understudy''') is a method for evaluating the quality of text which has been translated from one [[natural language]] to another using [[machine translation]].@@@@1@26@@danf@17-8-2009 10090050@unknown@formal@none@1@S@BLEU was one of the first [[software metric]]s to report high [[correlation]] with human judgements of quality.@@@@1@17@@danf@17-8-2009 10090060@unknown@formal@none@1@S@The metric is currently one of the most popular in the field.@@@@1@12@@danf@17-8-2009 10090070@unknown@formal@none@1@S@The central idea behind the metric is that, "the closer a machine translation is to a professional human translation, the better it is".@@@@1@23@@danf@17-8-2009 10090080@unknown@formal@none@1@S@The metric calculates scores for individual segments, generally [[Sentence (linguistics)|sentence]]s, and then averages these scores over the whole [[corpus]] in order to reach a final score.@@@@1@26@@danf@17-8-2009 10090090@unknown@formal@none@1@S@It has been shown to correlate highly with human judgements of quality at the corpus level.@@@@1@16@@danf@17-8-2009 10090100@unknown@formal@none@1@S@The quality of translation is indicated as a number between 0 and 1 and is measured as statistical closeness to a given set of good quality human reference translations.@@@@1@29@@danf@17-8-2009 10090110@unknown@formal@none@1@S@Therefore, it does not directly take into account translation intelligibility or grammatical correctness.@@@@1@13@@danf@17-8-2009 10090120@unknown@formal@none@1@S@The metric works by measuring the [[n-gram]] co-occurrence between a given translation and the set of reference translations and then taking the weighted [[geometric mean]].@@@@1@25@@danf@17-8-2009 10090130@unknown@formal@none@1@S@BLEU is specifically designed to approximate human judgement on a [[corpus]] level and performs badly if used to evaluate the quality of isolated sentences.@@@@1@24@@danf@17-8-2009 10090140@unknown@formal@none@1@S@==Algorithm==@@@@1@1@@danf@17-8-2009 10090150@unknown@formal@none@1@S@BLEU uses a modified form of [[precision]] to compare a candidate translation against multiple reference translations.@@@@1@16@@danf@17-8-2009 10090160@unknown@formal@none@1@S@The metric modifies simple precision since machine translation systems have been known to generate more words than appear in a reference text.@@@@1@22@@danf@17-8-2009 10090170@unknown@formal@none@1@S@This is illustrated in the following example from Papineni et al. (2002),@@@@1@12@@danf@17-8-2009 10090180@unknown@formal@none@1@S@In this example, the candidate text is given a unigram precision of,@@@@1@12@@danf@17-8-2009 10090190@unknown@formal@none@1@S@:P = \\frac{m}{w_{t}} = \\frac{7}{7} = 1@@@@1@7@@danf@17-8-2009 10090200@unknown@formal@none@1@S@Of the seven words in the candidate translation, all of them appear in the reference translations.@@@@1@16@@danf@17-8-2009 10090210@unknown@formal@none@1@S@This presents a problem for a metric, as the candidate translation above is complete nonsense, retaining none of the content of either of the references.@@@@1@25@@danf@17-8-2009 10090220@unknown@formal@none@1@S@The modification that BLEU makes is fairly straightforward.@@@@1@8@@danf@17-8-2009 10090230@unknown@formal@none@1@S@For each word in the candidate translation, the algorithm takes the maximum total count in the reference translations.@@@@1@18@@danf@17-8-2009 10090240@unknown@formal@none@1@S@Taking the example above, the word 'the' appears twice in reference 1, and once in reference 2.@@@@1@17@@danf@17-8-2009 10090250@unknown@formal@none@1@S@The largest value is taken, in this case '2' as the "maximum reference count".@@@@1@14@@danf@17-8-2009 10090260@unknown@formal@none@1@S@For each of the words in the candidate translation, the count of the word is compared against the maximum reference count, and the lowest value is taken.@@@@1@27@@danf@17-8-2009 10090270@unknown@formal@none@1@S@In this case, the count of the word 'the' in the candidate translation is '7', while the maximum reference count for the word is '2'.@@@@1@25@@danf@17-8-2009 10090280@unknown@formal@none@1@S@This "modified count" is then divided by the total number of words in the candidate translation.@@@@1@16@@danf@17-8-2009 10090290@unknown@formal@none@1@S@In the above example, the modified unigram precision score would be,@@@@1@11@@danf@17-8-2009 10090300@unknown@formal@none@1@S@:P = \\frac{2}{7}@@@@1@3@@danf@17-8-2009 10090310@unknown@formal@none@1@S@The above method is used to calculate scores for each n.@@@@1@11@@danf@17-8-2009 10090320@unknown@formal@none@1@S@The value of n which has the "highest correlation with monolingual human judgements" was found to be 4.@@@@1@18@@danf@17-8-2009 10090330@unknown@formal@none@1@S@The unigram scores are found to account for the adequacy of the translation, in other words, how much information is retained in the translation.@@@@1@24@@danf@17-8-2009 10090340@unknown@formal@none@1@S@The longer n-gram scores account for the fluency of the translation, or to what extent it reads like "good English".@@@@1@20@@danf@17-8-2009 10090350@unknown@formal@none@1@S@The modification made to precision does not solve the problem of short translations.@@@@1@13@@danf@17-8-2009 10090360@unknown@formal@none@1@S@Short translations can produce very high precision scores, even using modified precision.@@@@1@12@@danf@17-8-2009 10090370@unknown@formal@none@1@S@An example of a candidate translation for the same references as above might be:@@@@1@14@@danf@17-8-2009 10090380@unknown@formal@none@1@S@:the cat@@@@1@2@@danf@17-8-2009 10090390@unknown@formal@none@1@S@In this example, the modified unigram precision would be,@@@@1@9@@danf@17-8-2009 10090400@unknown@formal@none@1@S@:P = \\frac{1}{2} + \\frac{1}{2} = \\frac{2}{2}@@@@1@7@@danf@17-8-2009 10090410@unknown@formal@none@1@S@as the word 'the' and the word 'cat' appear once each in the candidate, and the total number of words is two.@@@@1@22@@danf@17-8-2009 10090420@unknown@formal@none@1@S@The modified bigram precision would be 1 / 1 as the bigram, "the cat" appears once in the candidate.@@@@1@19@@danf@17-8-2009 10090430@unknown@formal@none@1@S@It has been pointed out that precision is usually twinned with [[recall]] to overcome this problem , as the unigram recall of this example would be 2 / 6 or 2 / 7.@@@@1@33@@danf@17-8-2009 10090440@unknown@formal@none@1@S@The problem being that as there are multiple reference translations, a bad translation could easily have an inflated recall, such as a translation which consisted of all the words in each of the references.@@@@1@34@@danf@17-8-2009 10090450@unknown@formal@none@1@S@In order to produce a score for the whole corpus, the modified precision scores for the segments are combined using the [[geometric mean]], multiplied by a brevity penalty, whose purpose is to prevent very short candidates from receiving too high a score.@@@@1@42@@danf@17-8-2009 10090460@unknown@formal@none@1@S@Let r be the total length of the reference corpus, and c the total length of the translation corpus.@@@@1@19@@danf@17-8-2009 10090470@unknown@formal@none@1@S@If c \\leq r, the brevity penalty applies and is defined to be e^{(1-r/c)}.@@@@1@14@@danf@17-8-2009 10090480@unknown@formal@none@1@S@(In the case of multiple reference sentences, r is taken to be the sum of the lengths of the sentences whose lengths are closest to the lengths of the candidate sentences.@@@@1@31@@danf@17-8-2009 10090490@unknown@formal@none@1@S@However, in the version of the metric used by [[NIST]], the short reference sentence is used.)@@@@1@16@@danf@17-8-2009 10090500@unknown@formal@none@1@S@==Performance==@@@@1@1@@danf@17-8-2009 10090510@unknown@formal@none@1@S@BLEU has frequently been reported as correlating well with human judgement, and certainly remains a benchmark for any new evaluation metric to beat.@@@@1@23@@danf@17-8-2009 10090520@unknown@formal@none@1@S@There are however a number of criticisms that have been voiced.@@@@1@11@@danf@17-8-2009 10090530@unknown@formal@none@1@S@It has been noted that while in theory capable of evaluating any language, BLEU does not in the present form work on languages without word boundaries.@@@@1@26@@danf@17-8-2009 10090540@unknown@formal@none@1@S@It has been argued that although BLEU certainly has significant advantages, there is no guarantee that an increase in BLEU score is an indicator of improved translation quality.@@@@1@28@@danf@17-8-2009 10090550@unknown@formal@none@1@S@As BLEU scores are taken at the corpus level, it is difficult to give a textual example.@@@@1@17@@danf@17-8-2009 10090560@unknown@formal@none@1@S@Nevertheless, they highlight two instances where BLEU seriously underperformed.@@@@1@9@@danf@17-8-2009 10090570@unknown@formal@none@1@S@These were the 2005 [[NIST]] evaluations where a number of different machine translation systems were tested, and their study of the [[SYSTRAN]] engine versus two engines using [[statistical machine translation]] (SMT) techniques.@@@@1@32@@danf@17-8-2009 10090580@unknown@formal@none@1@S@In the 2005 NIST evaluation, they report that the scores generated by BLEU failed to correspond to the scores produced in the human evaluations.@@@@1@24@@danf@17-8-2009 10090590@unknown@formal@none@1@S@The system which was ranked highest by the human judges was only ranked 6th by BLEU.@@@@1@16@@danf@17-8-2009 10090600@unknown@formal@none@1@S@In their study, they compared SMT systems with SYSTRAN, a knowledge based system.@@@@1@13@@danf@17-8-2009 10090610@unknown@formal@none@1@S@The scores from BLEU for SYSTRAN were substantially worse than the scores given to SYSTRAN by the human judges.@@@@1@19@@danf@17-8-2009 10090620@unknown@formal@none@1@S@They note that the SMT systems were trained using BLEU minimum error rate training, and point out that this could be one of the reasons behind the difference.@@@@1@28@@danf@17-8-2009 10090630@unknown@formal@none@1@S@They conclude by recommending that BLEU be used in a more restricted manner, for comparing the results from two similar systems, and for tracking "broad, incremental changes to a single system".@@@@1@31@@danf@17-8-2009 10100010@unknown@formal@none@1@S@
Business intelligence
@@@@1@2@@danf@17-8-2009 10100020@unknown@formal@none@1@S@'''Business intelligence''' ('''BI''') refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business [[information]] and sometimes to the information itself.@@@@1@25@@danf@17-8-2009 10100030@unknown@formal@none@1@S@The purpose of business intelligence--a term that dates at least to 1958--is to support better business decision making.@@@@1@18@@danf@17-8-2009 10100040@unknown@formal@none@1@S@Thus, BI is also described as a [[decision support system]] (DSS):@@@@1@11@@danf@17-8-2009 10100050@unknown@formal@none@1@S@
BI is sometimes used interchangeably with briefing books, report and query tools and executive information systems.@@@@1@17@@danf@17-8-2009 10100060@unknown@formal@none@1@S@In general, business intelligence systems are data-driven DSS.
@@@@1@8@@danf@17-8-2009 10100070@unknown@formal@none@1@S@BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a [[data warehouse]] or a [[data mart]] and occasionally working from operational data.@@@@1@33@@danf@17-8-2009 10100080@unknown@formal@none@1@S@Software elements support the use of this information by assisting in the extraction, analysis, and reporting of information.@@@@1@18@@danf@17-8-2009 10100090@unknown@formal@none@1@S@Applications tackle sales, production, financial, and many other sources of business data for purposes that include, notably, [[business performance management]].@@@@1@20@@danf@17-8-2009 10100100@unknown@formal@none@1@S@Information may be gathered on comparable companies to produce [[benchmarking|benchmarks]].@@@@1@10@@danf@17-8-2009 10100110@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10100120@unknown@formal@none@1@S@Prior to the start of the [[Information Age]] in the late 20th century, businesses had to collect data from non-automated sources.@@@@1@21@@danf@17-8-2009 10100130@unknown@formal@none@1@S@Businesses then lacked the computing resources necessary to properly analyze the data, and as a result, companies often made business decisions primarily on the basis of [[intuition (knowledge)|intuition]].@@@@1@28@@danf@17-8-2009 10100140@unknown@formal@none@1@S@As businesses automated systems the amount of data increased but its collection remained difficult due to the inability of information to be moved between or within systems.@@@@1@27@@danf@17-8-2009 10100150@unknown@formal@none@1@S@Analysis of information informed for long-term decision making, but was slow and often required the use of instinct or expertise to make short-term decisions.@@@@1@24@@danf@17-8-2009 10100160@unknown@formal@none@1@S@Business intelligence was defined in 1958 by [[Hans Peter Luhn]], who wrote,@@@@1@12@@danf@17-8-2009 10100170@unknown@formal@none@1@S@
In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.@@@@1@26@@danf@17-8-2009 10100180@unknown@formal@none@1@S@The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.@@@@1@21@@danf@17-8-2009 10100190@unknown@formal@none@1@S@The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
@@@@1@35@@danf@17-8-2009 10100200@unknown@formal@none@1@S@In 1989 Howard Dresner, later a [[Gartner Group]] analyst, popularized BI as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems."@@@@1@30@@danf@17-8-2009 10100210@unknown@formal@none@1@S@In modern businesses the use of standards, automation and specialized software, including [[Online analytical processing|analytical tools]], allows large volumes of data to be [[Extract, transform, load|extracted, transformed, loaded]] and [[Data warehouse|warehoused]] to greatly increase the speed at which information becomes available for decision-making.@@@@1@43@@danf@17-8-2009 10100220@unknown@formal@none@1@S@===Key intelligence topics===@@@@1@3@@danf@17-8-2009 10100230@unknown@formal@none@1@S@Business intelligence often uses [[key performance indicators]] (KPIs) to assess the present state of business and to prescribe a course of action.@@@@1@22@@danf@17-8-2009 10100240@unknown@formal@none@1@S@Examples of KPIs are things such as lead conversion rate (in sales) and inventory turnover (in inventory management).@@@@1@18@@danf@17-8-2009 10100250@unknown@formal@none@1@S@Prior to the widespread adoption of computer and web applications, when information had to be manually input and calculated, performance data was often not available for weeks or months.@@@@1@29@@danf@17-8-2009 10100260@unknown@formal@none@1@S@Recently, banks have tried to make data available at shorter intervals and have reduced delays.@@@@1@15@@danf@17-8-2009 10100270@unknown@formal@none@1@S@The KPI methodology was further expanded with the Chief Performance Officer methodology which incorporated KPIs and root cause analysis into a single methodology.@@@@1@23@@danf@17-8-2009 10100280@unknown@formal@none@1@S@Businesses that face higher operational/[[credit risk]] loading, such as [[credit card]] companies and "wealth management" services, often make KPI-related data available weekly.@@@@1@22@@danf@17-8-2009 10100290@unknown@formal@none@1@S@In some cases, companies may even offer a daily analysis of data.@@@@1@12@@danf@17-8-2009 10100300@unknown@formal@none@1@S@This fast pace requires analysts to use [[information technology|IT]] [[system]]s to process this large volume of data.@@@@1@17@@danf@17-8-2009 10110010@unknown@formal@none@1@S@
Chatterbot
@@@@1@1@@danf@17-8-2009 10110020@unknown@formal@none@1@S@A '''chatterbot''' (or chatbot) is a type of conversational agent, a [[computer program]] designed to simulate an intelligent [[conversation]] with one or more human users via auditory or textual methods.@@@@1@30@@danf@17-8-2009 10110030@unknown@formal@none@1@S@In other words, a chatterbot is a computer program with artificial intelligence to talk to people through voices or typed words.@@@@1@21@@danf@17-8-2009 10110040@unknown@formal@none@1@S@Though many appear to be intelligently interpreting the human input prior to providing a response, most chatterbots simply scan for keywords within the input and pull a reply with the most matching keywords or the most similar wording pattern from a local [[database]].@@@@1@43@@danf@17-8-2009 10110050@unknown@formal@none@1@S@Chatterbots may also be referred to as ''talk bots'', ''chat bots'', or ''chatterboxes''.@@@@1@13@@danf@17-8-2009 10110060@unknown@formal@none@1@S@== Method of operation ==@@@@1@5@@danf@17-8-2009 10110070@unknown@formal@none@1@S@A good understanding of a conversation is required to carry on a meaningful dialog but most chatterbots do not attempt this.@@@@1@21@@danf@17-8-2009 10110080@unknown@formal@none@1@S@Instead they "converse" by recognizing cue words or phrases from the human user, which allows them to use pre-prepared or pre-calculated responses which can move the conversation on in an apparently meaningful way without requiring them to know what they are talking about.@@@@1@43@@danf@17-8-2009 10110090@unknown@formal@none@1@S@For example, if a human types, "I am feeling very worried lately," the chatterbot may be programmed to recognize the phrase "I am" and respond by replacing it with "Why are you" plus a question mark at the end, giving the answer, "Why are you feeling very worried lately?"@@@@1@49@@danf@17-8-2009 10110100@unknown@formal@none@1@S@A similar approach using keywords would be for the program to answer any comment including ''(Name of celebrity)'' with "I think they're great, don't you?"@@@@1@25@@danf@17-8-2009 10110110@unknown@formal@none@1@S@Humans, especially those unfamiliar with chatterbots, sometimes find the resulting conversations engaging.@@@@1@12@@danf@17-8-2009 10110120@unknown@formal@none@1@S@Critics of chatterbots call this engagement the [[ELIZA effect]].@@@@1@9@@danf@17-8-2009 10110130@unknown@formal@none@1@S@Some programs classified as chatterbots use other principles.@@@@1@8@@danf@17-8-2009 10110140@unknown@formal@none@1@S@One example is [[Jabberwacky]], which attempts to model the way humans learn new facts and language.@@@@1@16@@danf@17-8-2009 10110150@unknown@formal@none@1@S@[[Ellaz Systems|ELLA]] attempts to use [[natural language processing]] to make more useful responses from a human's input.@@@@1@17@@danf@17-8-2009 10110160@unknown@formal@none@1@S@Some programs that use natural language conversation, such as [[SHRDLU]], are not generally classified as chatterbots because they link their speech ability to knowledge of a simulated world.@@@@1@28@@danf@17-8-2009 10110170@unknown@formal@none@1@S@This type of link requires a more complex [[artificial intelligence]] (eg., a "vision" system) than standard chatterbots have.@@@@1@18@@danf@17-8-2009 10110180@unknown@formal@none@1@S@== Early chatterbots ==@@@@1@4@@danf@17-8-2009 10110190@unknown@formal@none@1@S@The classic early chatterbots are [[ELIZA]] and [[PARRY]].@@@@1@8@@danf@17-8-2009 10110200@unknown@formal@none@1@S@More recent programs are [[Racter]], [[Verbot]]s, [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]], and [[Ellaz Systems|ELLA]].@@@@1@14@@danf@17-8-2009 10110210@unknown@formal@none@1@S@The growth of chatterbots as a research field has created an expansion in their purposes.@@@@1@15@@danf@17-8-2009 10110220@unknown@formal@none@1@S@While ELIZA and PARRY were used exclusively to simulate typed conversation, [[Racter]] was used to "write" a story called ''The Policeman's Beard is Half Constructed''.@@@@1@25@@danf@17-8-2009 10110230@unknown@formal@none@1@S@ELLA includes a collection of games and functional features to further extend the potential of chatterbots.@@@@1@16@@danf@17-8-2009 10110240@unknown@formal@none@1@S@The term "ChatterBot" was coined by [[Michael Loren Mauldin|Michael Mauldin]] (Creator of the first [[Verbot]], Julia) in 1994 to describe these conversational programs.@@@@1@23@@danf@17-8-2009 10110250@unknown@formal@none@1@S@== Malicious chatterbots ==@@@@1@4@@danf@17-8-2009 10110260@unknown@formal@none@1@S@Malicious chatterbots are frequently used to fill chat rooms with spam and advertising, or to entice people into revealing personal information, such as bank account numbers.@@@@1@26@@danf@17-8-2009 10110270@unknown@formal@none@1@S@They are commonly found on [[Yahoo! Messenger]], [[Windows Live Messenger]], [[AOL Instant Messenger]] and other [[instant messaging]] protocols.@@@@1@18@@danf@17-8-2009 10110280@unknown@formal@none@1@S@There has been a published report of a chatterbot used in a fake personal ad on a dating service's website.@@@@1@20@@danf@17-8-2009 10110290@unknown@formal@none@1@S@==Chatterbots in modern AI==@@@@1@4@@danf@17-8-2009 10110300@unknown@formal@none@1@S@Most modern AI research focuses on practical engineering tasks.@@@@1@9@@danf@17-8-2009 10110310@unknown@formal@none@1@S@This is known as weak AI and is distinguished from [[strong AI]], which would require [[sapience]] and reasoning abilities.@@@@1@19@@danf@17-8-2009 10110320@unknown@formal@none@1@S@One pertinent field of AI research is natural language.@@@@1@9@@danf@17-8-2009 10110330@unknown@formal@none@1@S@Usually weak AI fields employ specialised software or programming languages created for them.@@@@1@13@@danf@17-8-2009 10110340@unknown@formal@none@1@S@For example, one of the 'most-human' natural language chatterbots, [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]], uses a programming language called AIML that is specific to its program, and its various clones, named Alicebots.@@@@1@32@@danf@17-8-2009 10110350@unknown@formal@none@1@S@Nevertheless, A.L.I.C.E. is still based on pattern matching without any reasoning.@@@@1@11@@danf@17-8-2009 10110360@unknown@formal@none@1@S@This is the same technique [[ELIZA]], the first chatterbot, was using back in 1966.@@@@1@14@@danf@17-8-2009 10110370@unknown@formal@none@1@S@Australian company MyCyberTwin also deals in strong AI, allowing users to create and sustain their own virtual personalities online.@@@@1@19@@danf@17-8-2009 10110380@unknown@formal@none@1@S@MyCyberTwin.com also works in a corporate setting, allowing companies to set up Virtual AI Assistants.@@@@1@15@@danf@17-8-2009 10110390@unknown@formal@none@1@S@Another notable program, known as [[Jabberwacky]], also deals in strong AI, as it is claimed to learn new responses based on user interactions, rather than being driven from a static database like many other existing chatterbots.@@@@1@36@@danf@17-8-2009 10110400@unknown@formal@none@1@S@Although such programs show initial promise, many of the existing results in trying to tackle the problem of natural language still appear fairly poor, and it seems reasonable to state that there is currently no general purpose conversational artificial intelligence.@@@@1@40@@danf@17-8-2009 10110410@unknown@formal@none@1@S@This has led some software developers to focus more on the practical aspect of chatterbot technology - information retrieval.@@@@1@19@@danf@17-8-2009 10110420@unknown@formal@none@1@S@A common rebuttal often used within the AI community against criticism of such approaches asks, "How do we know that humans don't also just follow some cleverly devised rules?" (in the way that Chatterbots do).@@@@1@35@@danf@17-8-2009 10110430@unknown@formal@none@1@S@Two famous examples of this line of argument against the rationale for the basis of the Turing test are John Searle's [[Chinese room]] argument and Ned Block's [[Intentional stance|Blockhead argument]].@@@@1@30@@danf@17-8-2009 10110440@unknown@formal@none@1@S@==Chatterbots/Virtual Assistants in Commercial Environments==@@@@1@5@@danf@17-8-2009 10110450@unknown@formal@none@1@S@Automated Conversational Systems have progressed and evolved far from the original designs of the first widely used chatbots.@@@@1@18@@danf@17-8-2009 10110460@unknown@formal@none@1@S@In the UK, large commercial entities such as Lloyds TSB, Royal Bank of Scotland, Renault, Citroën and One Railway are already utilizing Virtual Assistants to reduce expenditures on Call Centres and provide a first point of contact that can inform the user exactly of points of interest, provide support, capture data from the user and promote products for sale.@@@@1@59@@danf@17-8-2009 10110470@unknown@formal@none@1@S@In the UK, new projects and research are being conducted to introduce a Virtual Assistant into the classroom to assist the teacher.@@@@1@22@@danf@17-8-2009 10110480@unknown@formal@none@1@S@This project is the first of its kind and the chatbot VA in question is based on the Yhaken [http://www.elzware.com] chatbot design.@@@@1@22@@danf@17-8-2009 10110490@unknown@formal@none@1@S@The Yhaken template provides a further move forward in Automated Conversational Systems with features such as complex conversational routing and responses, well defined personality, a complex hierarchical construct with additional external reference points, emotional responses and in depth small talk, all to make the experience more interactive and involving for the user.@@@@1@52@@danf@17-8-2009 10110500@unknown@formal@none@1@S@==Annual contests for chatterbots==@@@@1@4@@danf@17-8-2009 10110510@unknown@formal@none@1@S@Many organizations tries to encourage and support developers all over the world to develop chatterbots that able to do variety of tasks and compete with each other through [[turing test]]s and more.@@@@1@32@@danf@17-8-2009 10110520@unknown@formal@none@1@S@Annual contests are organized at the following links:@@@@1@8@@danf@17-8-2009 10110530@unknown@formal@none@1@S@*[http://www.chatterboxchallenge.com The Chatterbox Challenge]@@@@1@4@@danf@17-8-2009 10110540@unknown@formal@none@1@S@*[http://www.loebner.net/Prizef/loebner-prize.html The Loebner Prize]@@@@1@4@@danf@17-8-2009 10120010@unknown@formal@none@1@S@
Computational linguistics
@@@@1@2@@danf@17-8-2009 10120020@unknown@formal@none@1@S@'''Computational linguistics''' is an [[interdisciplinary]] field dealing with the [[Statistics|statistical]] and/or rule-based modeling of [[natural language]] from a computational perspective.@@@@1@20@@danf@17-8-2009 10120030@unknown@formal@none@1@S@This modeling is not limited to any particular field of [[linguistics]].@@@@1@11@@danf@17-8-2009 10120040@unknown@formal@none@1@S@Traditionally, computational linguistics was usually performed by [[computer scientist]]s who had specialized in the application of computers to the processing of a [[natural language]].@@@@1@24@@danf@17-8-2009 10120050@unknown@formal@none@1@S@Recent research has shown that human language is much more complex than previously thought, so computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists.@@@@1@49@@danf@17-8-2009 10120060@unknown@formal@none@1@S@In general computational linguistics draws upon the involvement of linguists, [[computer science|computer scientists]], experts in [[artificial intelligence]], [[cognitive psychology|cognitive psychologists]], [[math]]ematicians, and [[logic]]ians, amongst others.@@@@1@25@@danf@17-8-2009 10120070@unknown@formal@none@1@S@==Origins==@@@@1@1@@danf@17-8-2009 10120080@unknown@formal@none@1@S@Computational linguistics as a field predates [[artificial intelligence]], a field under which it is often grouped.@@@@1@16@@danf@17-8-2009 10120090@unknown@formal@none@1@S@Computational linguistics originated with efforts in the [[United States]] in the 1950s to use computers to automatically translate texts from foreign languages, particularly [[Russian language|Russian]] scientific journals, into English.@@@@1@29@@danf@17-8-2009 10120100@unknown@formal@none@1@S@Since computers had proven their ability to do [[arithmetic]] much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.@@@@1@47@@danf@17-8-2009 10120110@unknown@formal@none@1@S@When [[machine translation]] (also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed.@@@@1@31@@danf@17-8-2009 10120120@unknown@formal@none@1@S@Computational linguistics was born as the name of the new field of study devoted to developing [[algorithm]]s and [[software]] for intelligently processing language data.@@@@1@24@@danf@17-8-2009 10120130@unknown@formal@none@1@S@When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages.@@@@1@29@@danf@17-8-2009 10120140@unknown@formal@none@1@S@In order to translate one language into another, it was observed that one had to understand the [[grammar]] of both languages, including both [[morphology (linguistics)|morphology]] (the grammar of word forms) and [[syntax]] (the grammar of sentence structure).@@@@1@37@@danf@17-8-2009 10120150@unknown@formal@none@1@S@In order to understand syntax, one had to also understand the [[semantics]] and the [[lexicon]] (or 'vocabulary'), and even to understand something of the [[pragmatics]] of language use.@@@@1@28@@danf@17-8-2009 10120160@unknown@formal@none@1@S@Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.@@@@1@27@@danf@17-8-2009 10120170@unknown@formal@none@1@S@==Subfields==@@@@1@1@@danf@17-8-2009 10120180@unknown@formal@none@1@S@Computational linguistics can be divided into major areas depending upon the medium of the language being processed, whether spoken or textual; and upon the task being performed, whether analyzing language (recognition) or synthesizing language (generation).@@@@1@35@@danf@17-8-2009 10120190@unknown@formal@none@1@S@[[Speech recognition]] and [[speech synthesis]] deal with how spoken language can be understood or created using computers.@@@@1@17@@danf@17-8-2009 10120200@unknown@formal@none@1@S@Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together.@@@@1@18@@danf@17-8-2009 10120210@unknown@formal@none@1@S@Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages.@@@@1@15@@danf@17-8-2009 10120220@unknown@formal@none@1@S@Some of the areas of research that are studied by computational linguistics include:@@@@1@13@@danf@17-8-2009 10120230@unknown@formal@none@1@S@*Computer aided [[corpus linguistics]]@@@@1@4@@danf@17-8-2009 10120240@unknown@formal@none@1@S@*Design of [[parser]]s or [[phrase chunking|chunkers]] for [[natural language]]s@@@@1@9@@danf@17-8-2009 10120250@unknown@formal@none@1@S@*Design of taggers like [[Part-of-speech tagging|POS-taggers (part-of-speech taggers)]]@@@@1@8@@danf@17-8-2009 10120260@unknown@formal@none@1@S@*Definition of specialized logics like resource logics for [[Natural language processing|NLP]]@@@@1@11@@danf@17-8-2009 10120270@unknown@formal@none@1@S@*Research in the relation between formal and natural languages in general@@@@1@11@@danf@17-8-2009 10120280@unknown@formal@none@1@S@*[[Machine translation]], e.g. by a translating computer@@@@1@7@@danf@17-8-2009 10120290@unknown@formal@none@1@S@*[[Computational complexity]] of natural language, largely modeled on [[automata theory]], with the application of [[context-sensitive grammar]] and [[Linear bounded automaton|linearly-bounded]] [[Turing machine]]s.@@@@1@22@@danf@17-8-2009 10120300@unknown@formal@none@1@S@The [[Association for Computational Linguistics]] defines computational linguistics as:@@@@1@9@@danf@17-8-2009 10120310@unknown@formal@none@1@S@:...the scientific study of [[language]] from a computational perspective.@@@@1@9@@danf@17-8-2009 10120320@unknown@formal@none@1@S@Computational linguists are interested in providing [[computational model]]s of various kinds of linguistic phenomena.@@@@1@14@@danf@17-8-2009 10130010@unknown@formal@none@1@S@
Computer program
@@@@1@2@@danf@17-8-2009 10130020@unknown@formal@none@1@S@'''Computer programs''' (also '''[[Computer software|software programs]]''', or just '''programs''') are [[Instruction (computer science)|instructions]] for a [[computer]].@@@@1@16@@danf@17-8-2009 10130030@unknown@formal@none@1@S@A computer requires programs to function, and a computer program does nothing unless its instructions are executed by a [[Central processing unit|central processor]].@@@@1@23@@danf@17-8-2009 10130040@unknown@formal@none@1@S@Computer programs are usually [[executable]] programs or the [[source code]] from which executable programs are derived (e.g., [[compiler|compiled]]).@@@@1@18@@danf@17-8-2009 10130050@unknown@formal@none@1@S@Computer source code is often written by professional [[computer programmer]]s.@@@@1@10@@danf@17-8-2009 10130060@unknown@formal@none@1@S@Source code is written in a [[programming language]] that usually follows one of two main [[Programming paradigm|paradigms]]: [[imperative programming|imperative]] or [[declarative language|declarative]] programming.@@@@1@23@@danf@17-8-2009 10130070@unknown@formal@none@1@S@Source code may be converted into an [[executable file]] (sometimes called an executable program) by a [[compiler]].@@@@1@17@@danf@17-8-2009 10130080@unknown@formal@none@1@S@Alternatively, computer programs may be executed by a [[central processing unit]] with the aid of an [[Interpreter (computing)|interpreter]], or may be [[firmware|embedded]] directly into [[Computer hardware|hardware]].@@@@1@26@@danf@17-8-2009 10130090@unknown@formal@none@1@S@Computer programs may be categorized along functional lines: [[system software]] and [[application software]].@@@@1@13@@danf@17-8-2009 10130100@unknown@formal@none@1@S@And many computer programs may run simultaneously on a single computer, a process known as [[computer multitasking|multitasking]].@@@@1@17@@danf@17-8-2009 10130110@unknown@formal@none@1@S@==Programming==@@@@1@1@@danf@17-8-2009 10130120@unknown@formal@none@1@S@ main() {output_string("Hello world!");} @@@@1@5@@danf@17-8-2009 10130160@unknown@formal@none@1@S@Source code of a program written in the [[C programming language]]@@@@1@11@@danf@17-8-2009 10130170@unknown@formal@none@1@S@[[Computer programming]] is the iterative process of writing or editing [[source code]].@@@@1@12@@danf@17-8-2009 10130180@unknown@formal@none@1@S@Editing source code involves testing, analyzing, and refining.@@@@1@8@@danf@17-8-2009 10130190@unknown@formal@none@1@S@A person who practices this skill is referred to as a computer [[programmer]] or software developer.@@@@1@16@@danf@17-8-2009 10130200@unknown@formal@none@1@S@The sometimes lengthy process of computer programming is usually referred to as [[software development]].@@@@1@14@@danf@17-8-2009 10130210@unknown@formal@none@1@S@The term [[software engineering]] is becoming popular as the process is seen as an [[engineering]] discipline.@@@@1@16@@danf@17-8-2009 10130220@unknown@formal@none@1@S@=== Paradigms ===@@@@1@3@@danf@17-8-2009 10130230@unknown@formal@none@1@S@Computer programs can be categorized by the [[programming language]] [[Programming paradigm|paradigm]] used to produce them.@@@@1@15@@danf@17-8-2009 10130240@unknown@formal@none@1@S@Two of the main paradigms are [[imperative programming|imperative]] and [[declarative language|declarative]].@@@@1@11@@danf@17-8-2009 10130250@unknown@formal@none@1@S@Programs written using an imperative language specify an [[algorithm]] using declarations, expressions, and statements.@@@@1@14@@danf@17-8-2009 10130260@unknown@formal@none@1@S@A declaration associates a [[variable]] name with a [[datatype]].@@@@1@9@@danf@17-8-2009 10130270@unknown@formal@none@1@S@For example: var x: integer; .@@@@1@7@@danf@17-8-2009 10130280@unknown@formal@none@1@S@An expression yields a value.@@@@1@5@@danf@17-8-2009 10130290@unknown@formal@none@1@S@For example: 2 + 2 yields 4.@@@@1@9@@danf@17-8-2009 10130300@unknown@formal@none@1@S@Finally, a statement might assign an expression to a variable or use the value of a variable to alter the program's control flow.@@@@1@23@@danf@17-8-2009 10130310@unknown@formal@none@1@S@For example: x := 2 + 2; if x = 4 then do_something();@@@@1@13@@danf@17-8-2009 10130315@unknown@formal@none@1@S@One criticism of imperative languages is the side-effect of an assignment statement on a class of variables called non-local variables.@@@@1@20@@danf@17-8-2009 10130320@unknown@formal@none@1@S@Programs written using a declarative language specify the properties that have to be met by the output and do not specify any implementation details.@@@@1@24@@danf@17-8-2009 10130330@unknown@formal@none@1@S@Two broad categories of declarative languages are [[functional language]]s and [[logical language]]s.@@@@1@12@@danf@17-8-2009 10130340@unknown@formal@none@1@S@The principle behind functional languages (like [[Haskell (programming language)|Haskell]]) is to not allow side-effects, which makes it easier to reason about programs like mathematical functions.@@@@1@25@@danf@17-8-2009 10130350@unknown@formal@none@1@S@The principle behind logical languages (like [[Prolog]]) is to define the problem to be solved — the goal — and leave the detailed solution to the Prolog system itself.@@@@1@29@@danf@17-8-2009 10130360@unknown@formal@none@1@S@The goal is defined by providing a list of subgoals.@@@@1@10@@danf@17-8-2009 10130370@unknown@formal@none@1@S@Then each subgoal is defined by further providing a list of its subgoals, etc.@@@@1@14@@danf@17-8-2009 10130380@unknown@formal@none@1@S@If a path of subgoals fails to find a solution, then that subgoal is [[Backtracking|backtracked]] and another path is systematically attempted.@@@@1@21@@danf@17-8-2009 10130390@unknown@formal@none@1@S@The form in which a program is created may be textual or visual.@@@@1@13@@danf@17-8-2009 10130400@unknown@formal@none@1@S@In a [[visual language]] program, elements are graphically manipulated rather than textually specified.@@@@1@13@@danf@17-8-2009 10130410@unknown@formal@none@1@S@===Compilation or interpretation===@@@@1@3@@danf@17-8-2009 10130420@unknown@formal@none@1@S@A ''computer program'' in the form of a [[human-readable]], computer programming language is called [[source code]].@@@@1@16@@danf@17-8-2009 10130430@unknown@formal@none@1@S@Source code may be converted into an [[executable file|executable image]] by a [[compiler]] or executed immediately with the aid of an [[Interpreter (computing)|interpreter]].@@@@1@23@@danf@17-8-2009 10130440@unknown@formal@none@1@S@Compiled computer programs are commonly referred to as executables, binary images, or simply as [[binary file|binaries]] — a reference to the [[binary numeral system|binary]] [[file format]] used to store the executable code.@@@@1@32@@danf@17-8-2009 10130450@unknown@formal@none@1@S@Compilers are used to translate source code from a programming language into either [[object file|object code]] or [[machine code]].@@@@1@19@@danf@17-8-2009 10130460@unknown@formal@none@1@S@Object code needs further processing to become machine code, and machine code is the [[Central processing unit|Central Processing Unit]]'s native [[microcode|code]], ready for execution.@@@@1@24@@danf@17-8-2009 10130470@unknown@formal@none@1@S@Interpreted computer programs are either decoded and then immediately executed or are decoded into some efficient intermediate representation for future execution.@@@@1@21@@danf@17-8-2009 10130480@unknown@formal@none@1@S@[[BASIC]], [[Perl]], and [[Python (programming language)|Python]] are examples of immediately executed computer programs.@@@@1@13@@danf@17-8-2009 10130490@unknown@formal@none@1@S@Alternatively, [[Java (programming language)|Java]] computer programs are compiled ahead of time and stored as a machine independent code called [[bytecode]].@@@@1@20@@danf@17-8-2009 10130500@unknown@formal@none@1@S@Bytecode is then executed upon request by an interpreter called a [[virtual machine]].@@@@1@13@@danf@17-8-2009 10130510@unknown@formal@none@1@S@The main disadvantage of interpreters is computer programs run slower than if compiled.@@@@1@13@@danf@17-8-2009 10130520@unknown@formal@none@1@S@Interpreting code is slower than running the compiled version because the interpreter must [[decode]] each [[Statement (programming)|statement]] each time it is loaded and then perform the desired action.@@@@1@28@@danf@17-8-2009 10130530@unknown@formal@none@1@S@On the other hand, software development may be quicker using an interpreter because testing is immediate when the compilation step is omitted.@@@@1@22@@danf@17-8-2009 10130540@unknown@formal@none@1@S@Another disadvantage of interpreters is the interpreter must be present on the computer at the time the computer program is executed.@@@@1@21@@danf@17-8-2009 10130550@unknown@formal@none@1@S@Alternatively, compiled computer programs need not have the compiler present at the time of execution.@@@@1@15@@danf@17-8-2009 10130560@unknown@formal@none@1@S@No properties of a programming language require it to be exclusively compiled or exclusively interpreted.@@@@1@15@@danf@17-8-2009 10130570@unknown@formal@none@1@S@The categorization usually reflects the most popular method of language execution.@@@@1@11@@danf@17-8-2009 10130580@unknown@formal@none@1@S@For example, BASIC is thought of as an interpreted language and C a compiled language, despite the existence of BASIC compilers and C interpreters.@@@@1@24@@danf@17-8-2009 10130590@unknown@formal@none@1@S@===Self-modifying programs===@@@@1@2@@danf@17-8-2009 10130600@unknown@formal@none@1@S@A computer program in [[execution (computers)|execution]] is normally treated as being different from the [[data (computing)|data]] the program operates on.@@@@1@20@@danf@17-8-2009 10130610@unknown@formal@none@1@S@However, in some cases this distinction is blurred when a computer program modifies itself.@@@@1@14@@danf@17-8-2009 10130620@unknown@formal@none@1@S@The modified computer program is subsequently executed as part of the same program.@@@@1@13@@danf@17-8-2009 10130630@unknown@formal@none@1@S@[[Self-modifying code]] is possible for programs written in [[Lisp programming language|Lisp]], [[cobol|COBOL]], and [[Prolog]].@@@@1@14@@danf@17-8-2009 10130640@unknown@formal@none@1@S@==Execution and storage==@@@@1@3@@danf@17-8-2009 10130650@unknown@formal@none@1@S@Typically, computer programs are stored in [[non-volatile memory]] until requested either directly or indirectly to be [[execution (computers)|executed]] by the computer user.@@@@1@22@@danf@17-8-2009 10130660@unknown@formal@none@1@S@Upon such a request, the program is loaded into [[random access memory]], by a computer program called an [[operating system]], where it can be accessed directly by the central processor.@@@@1@30@@danf@17-8-2009 10130670@unknown@formal@none@1@S@The central processor then executes ("runs") the program, instruction by instruction, until termination.@@@@1@13@@danf@17-8-2009 10130680@unknown@formal@none@1@S@A program in execution is called a [[Process (computing)|process]].@@@@1@9@@danf@17-8-2009 10130690@unknown@formal@none@1@S@Termination is either by normal self-termination or by error — software or hardware error.@@@@1@14@@danf@17-8-2009 10130700@unknown@formal@none@1@S@===Embedded programs===@@@@1@2@@danf@17-8-2009 10130710@unknown@formal@none@1@S@Some computer programs are embedded into hardware.@@@@1@7@@danf@17-8-2009 10130720@unknown@formal@none@1@S@A [[stored-program computer]] requires an initial computer program stored in its [[read-only memory]] to [[booting|boot]].@@@@1@15@@danf@17-8-2009 10130730@unknown@formal@none@1@S@The boot process is to identify and initialize all aspects of the system, from [[Processor register|CPU registers]] to [[Device driver|device controllers]] to [[Volatile memory|memory]] contents.@@@@1@25@@danf@17-8-2009 10130740@unknown@formal@none@1@S@Following the initialization process, this initial computer program loads the [[operating system]] and sets the [[program counter]] to begin normal operations.@@@@1@21@@danf@17-8-2009 10130750@unknown@formal@none@1@S@Independent of the host computer, a [[Peripheral|hardware device]] might have embedded [[firmware]] to control its operation.@@@@1@16@@danf@17-8-2009 10130760@unknown@formal@none@1@S@Firmware is used when the computer program is rarely or never expected to change, or when the program must not be lost when the power is off.@@@@1@27@@danf@17-8-2009 10130770@unknown@formal@none@1@S@===Manual programming===@@@@1@2@@danf@17-8-2009 10130780@unknown@formal@none@1@S@Computer programs historically were manually input to the central processor via switches.@@@@1@12@@danf@17-8-2009 10130790@unknown@formal@none@1@S@An instruction was represented by a configuration of on/off settings.@@@@1@10@@danf@17-8-2009 10130800@unknown@formal@none@1@S@After setting the configuration, an execute button was pressed.@@@@1@9@@danf@17-8-2009 10130810@unknown@formal@none@1@S@This process was then repeated.@@@@1@5@@danf@17-8-2009 10130820@unknown@formal@none@1@S@Computer programs also historically were manually input via [[paper tape]] or [[punched cards]].@@@@1@13@@danf@17-8-2009 10130830@unknown@formal@none@1@S@After the medium was loaded, the starting address was set via switches and the execute button pressed.@@@@1@17@@danf@17-8-2009 10130840@unknown@formal@none@1@S@===Automatic program generation===@@@@1@3@@danf@17-8-2009 10130850@unknown@formal@none@1@S@[[Generative programming]] is a style of [[computer programming]] that creates [[source code]] through [[generic programming|generic]] [[class (computer science)|classes]], [[Prototype-based programming|prototypes]], [[template (programming)|template]]s, [[aspect (computer science)|aspect]]s, and [[Code generation (compiler)|code generator]]s to improve [[programmer]] productivity.@@@@1@34@@danf@17-8-2009 10130860@unknown@formal@none@1@S@Source code is generated with [[programming tool]]s such as a [[template processor]] or an [[Integrated development environment|Integrated Development Environment]].@@@@1@19@@danf@17-8-2009 10130870@unknown@formal@none@1@S@The simplest form of source code generator is a [[Macro (computer science)|macro]] processor, such as the [[C preprocessor]], which replaces patterns in source code according to relatively simple rules.@@@@1@29@@danf@17-8-2009 10130880@unknown@formal@none@1@S@[[Software engine]]s output source code or [[Markup language|markup code]] that simultaneously become the input to another [[Process (computing)|computer process]].@@@@1@19@@danf@17-8-2009 10130890@unknown@formal@none@1@S@The analogy is that of one process driving another process, with the computer code being burned as fuel.@@@@1@18@@danf@17-8-2009 10130900@unknown@formal@none@1@S@[[Application server]]s are software engines that deliver applications to [[client computer]]s.@@@@1@11@@danf@17-8-2009 10130910@unknown@formal@none@1@S@For example, a [[Wiki software|Wiki]] is an application server that allows users to build [[dynamic web page|dynamic content]] assembled from [[article (publishing)|articles]].@@@@1@22@@danf@17-8-2009 10130920@unknown@formal@none@1@S@Wikis generate [[HTML]], [[CSS]], [[Java (programming language)|Java]], and [[Javascript]] which are then [[Interpreter (computing)|interpreted]] by a [[web browser]].@@@@1@18@@danf@17-8-2009 10130930@unknown@formal@none@1@S@=== Simultaneous execution===@@@@1@3@@danf@17-8-2009 10130940@unknown@formal@none@1@S@Many operating systems support [[computer multitasking|multitasking]] which enables many computer programs to appear to be running simultaneously on a single computer.@@@@1@21@@danf@17-8-2009 10130950@unknown@formal@none@1@S@Operating systems may run multiple programs through [[process scheduling]] — a software mechanism to [[Context switch|switch]] the CPU among processes frequently so that users can [[Time-sharing|interact]] with each program while it is running.@@@@1@33@@danf@17-8-2009 10130960@unknown@formal@none@1@S@Within hardware, modern day multiprocessor computers or computers with multicore processors may run multiple programs.@@@@1@15@@danf@17-8-2009 10130970@unknown@formal@none@1@S@== Functional categories ==@@@@1@4@@danf@17-8-2009 10130980@unknown@formal@none@1@S@Computer programs may be categorized along functional lines.@@@@1@8@@danf@17-8-2009 10130990@unknown@formal@none@1@S@These functional categories are [[system software]] and [[application software]].@@@@1@9@@danf@17-8-2009 10131000@unknown@formal@none@1@S@System software includes the [[operating system]] which couples the [[computer hardware|computer's hardware]] with the application software.@@@@1@16@@danf@17-8-2009 10131010@unknown@formal@none@1@S@The purpose of the operating system is to provide an environment in which application software executes in a convenient and efficient manner.@@@@1@22@@danf@17-8-2009 10131020@unknown@formal@none@1@S@In addition to the operating system, system software includes [[Utility software|utility programs]] that help manage and tune the computer.@@@@1@19@@danf@17-8-2009 10131030@unknown@formal@none@1@S@If a computer program is not system software then it is application software.@@@@1@13@@danf@17-8-2009 10131040@unknown@formal@none@1@S@Application software includes [[middleware]], which couples the system software with the [[user interface]].@@@@1@13@@danf@17-8-2009 10131050@unknown@formal@none@1@S@Application software also includes utility programs that help users solve application problems, like the need for sorting.@@@@1@17@@danf@17-8-2009 10140010@unknown@formal@none@1@S@
Computer science
@@@@1@2@@danf@17-8-2009 10140020@unknown@formal@none@1@S@'''Computer science''' (or '''computing science''') is the study and the [[science]] of the theoretical foundations of [[information]] and [[computation]] and their implementation and application in [[computer|computer system]]s.@@@@1@27@@danf@17-8-2009 10140030@unknown@formal@none@1@S@Computer science has many sub-fields; some emphasize the computation of specific results (such as [[computer graphics]]), while others relate to properties of [[computational problem]]s (such as [[computational complexity theory]]).@@@@1@29@@danf@17-8-2009 10140040@unknown@formal@none@1@S@Still others focus on the challenges in implementing computations.@@@@1@9@@danf@17-8-2009 10140050@unknown@formal@none@1@S@For example, [[programming language theory]] studies approaches to describing computations, while [[computer programming]] applies specific [[programming language]]s to solve specific computational problems.@@@@1@22@@danf@17-8-2009 10140060@unknown@formal@none@1@S@A further subfield, [[human-computer interaction]], focuses on the challenges in making computers and computations useful, usable and universally accessible to [[humans|people]].@@@@1@21@@danf@17-8-2009 10140070@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10140080@unknown@formal@none@1@S@The early foundations of what would become computer science predate the invention of the modern [[digital computer]].@@@@1@17@@danf@17-8-2009 10140090@unknown@formal@none@1@S@Machines for calculating fixed numerical tasks, such as the [[abacus]], have existed since antiquity.@@@@1@14@@danf@17-8-2009 10140100@unknown@formal@none@1@S@[[Wilhelm Schickard]] built the first mechanical calculator in 1623.@@@@1@9@@danf@17-8-2009 10140110@unknown@formal@none@1@S@[[Charles Babbage]] designed a [[difference engine]] in [[Victorian era|Victorian]] times (between 1837 and 1901) helped by [[Ada Lovelace]].@@@@1@18@@danf@17-8-2009 10140120@unknown@formal@none@1@S@Around 1900, the [[IBM]] corporation sold [[Key_punch|punch-card machines]].@@@@1@8@@danf@17-8-2009 10140130@unknown@formal@none@1@S@However, all of these machines were constrained to perform a single task, or at best some subset of all possible tasks.@@@@1@21@@danf@17-8-2009 10140140@unknown@formal@none@1@S@During the 1940s, as newer and more powerful computing machines were developed, the term ''computer'' came to refer to the machines rather than their human predecessors.@@@@1@26@@danf@17-8-2009 10140150@unknown@formal@none@1@S@As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study [[computation]] in general.@@@@1@26@@danf@17-8-2009 10140160@unknown@formal@none@1@S@Computer science began to be established as a distinct academic discipline in the 1960s, with the creation of the first computer science departments and degree programs.@@@@1@26@@danf@17-8-2009 10140170@unknown@formal@none@1@S@Since practical computers became available, many applications of computing have become distinct areas of study in their own right.@@@@1@19@@danf@17-8-2009 10140180@unknown@formal@none@1@S@Many initially believed it impossible that "computers themselves could actually be a scientific field of study" (Levy 1984, p. 11), though it was in the "late fifties" (Levy 1984, p.11) that it gradually became accepted among the greater academic population.@@@@1@40@@danf@17-8-2009 10140190@unknown@formal@none@1@S@It is the now well-known IBM brand that formed part of the computer science revolution during this time.@@@@1@18@@danf@17-8-2009 10140200@unknown@formal@none@1@S@'IBM' (short for International Business Machines) released the IBM 704 and later the IBM 709 computers, which were widely used during the exploration period of such devices.@@@@1@27@@danf@17-8-2009 10140210@unknown@formal@none@1@S@"Still, working with the IBM [computer] was frustrating...if you had misplaced as much as one letter in one instruction, the program would crash, and you would have to start the whole process over again" (Levy 1984, p.13).@@@@1@37@@danf@17-8-2009 10140220@unknown@formal@none@1@S@During the late 1950s, the computer science discipline was very much in its developmental stages, and such issues were commonplace.@@@@1@20@@danf@17-8-2009 10140230@unknown@formal@none@1@S@Time has seen significant improvements in the useability and effectiveness of computer science technology.@@@@1@14@@danf@17-8-2009 10140240@unknown@formal@none@1@S@Modern society has seen a significant shift from computers being used solely by experts or professionals to a more widespread user base.@@@@1@22@@danf@17-8-2009 10140250@unknown@formal@none@1@S@By the 1990s, computers became accepted as being the norm within everyday life.@@@@1@13@@danf@17-8-2009 10140260@unknown@formal@none@1@S@During this time data entry was a primary component of the use of computers, many preferring to streamline their business practices through the use of a computer.@@@@1@27@@danf@17-8-2009 10140270@unknown@formal@none@1@S@This also gave the additional benefit of removing the need of large amounts of documentation and file records which consumed much-needed physical space within offices.@@@@1@25@@danf@17-8-2009 10140280@unknown@formal@none@1@S@== Major achievements ==@@@@1@4@@danf@17-8-2009 10140290@unknown@formal@none@1@S@Despite its relatively short history as a formal academic discipline, computer science has made a number of fundamental contributions to [[science]] and [[society]].@@@@1@23@@danf@17-8-2009 10140300@unknown@formal@none@1@S@These include:@@@@1@2@@danf@17-8-2009 10140310@unknown@formal@none@1@S@;Applications within computer science@@@@1@4@@danf@17-8-2009 10140320@unknown@formal@none@1@S@* A formal definition of [[computation]] and [[computability]], and proof that there are computationally [[Undecidable problem|unsolvable]] and [[Intractable#Intractability|intractable]] problems.@@@@1@19@@danf@17-8-2009 10140330@unknown@formal@none@1@S@* The concept of a [[programming language]], a tool for the precise expression of methodological information at various levels of abstraction.@@@@1@21@@danf@17-8-2009 10140340@unknown@formal@none@1@S@;Applications outside of computing@@@@1@4@@danf@17-8-2009 10140350@unknown@formal@none@1@S@* Sparked the [[Digital Revolution]] which led to the current [[Information Age]] and the [[Internet]].@@@@1@15@@danf@17-8-2009 10140360@unknown@formal@none@1@S@* In [[cryptography]], [[Cryptanalysis of the Enigma|breaking the Enigma machine]] was an important factor contributing to the Allied victory in World War II.@@@@1@23@@danf@17-8-2009 10140370@unknown@formal@none@1@S@* [[Scientific computing]] enabled advanced study of the mind and mapping the human genome was possible with [[Human Genome Project]].@@@@1@20@@danf@17-8-2009 10140380@unknown@formal@none@1@S@[[Distributed computing]] projects like [[Folding\shome]] explore [[protein folding]].@@@@1@8@@danf@17-8-2009 10140390@unknown@formal@none@1@S@* [[Algorithmic trading]] has increased the [[Economic efficiency|efficiency]] and [[Market liquidity|liquidity]] of financial markets by using [[artificial intelligence]], [[machine learning]] and other [[statistics|statistical]] and [[Numerical analysis|numerical]] techniques on a large scale.@@@@1@31@@danf@17-8-2009 10140400@unknown@formal@none@1@S@== Relationship with other fields ==@@@@1@6@@danf@17-8-2009 10140410@unknown@formal@none@1@S@Despite its name, a significant amount of computer science does not involve the study of computers themselves.@@@@1@17@@danf@17-8-2009 10140420@unknown@formal@none@1@S@Because of this, several alternative names have been proposed.@@@@1@9@@danf@17-8-2009 10140430@unknown@formal@none@1@S@Danish scientist [[Peter Naur]] suggested the term ''datalogy'', to reflect the fact that the scientific discipline revolves around data and data treatment, while not necessarily involving computers.@@@@1@27@@danf@17-8-2009 10140440@unknown@formal@none@1@S@The first scientific institution to use the term was the Department of Datalogy at the University of Copenhagen, founded in 1969, with Peter Naur being the first professor in datalogy.@@@@1@30@@danf@17-8-2009 10140450@unknown@formal@none@1@S@The term is used mainly in the Scandinavian countries.@@@@1@9@@danf@17-8-2009 10140460@unknown@formal@none@1@S@Also, in the early days of computing, a number of terms for the and practitioners of the field of computing were suggested in the ''Communications are of the ACM''—''turingineer'', ''turologist'', ''flow-charts-man'', ''applied meta-mathematician'', and ''applied epistemologist''.@@@@1@36@@danf@17-8-2009 10140470@unknown@formal@none@1@S@Three months later in the same journal, ''comptologist'' was suggested, followed next year by ''hypologist''.@@@@1@15@@danf@17-8-2009 10140480@unknown@formal@none@1@S@Recently the term ''computics'' has been suggested.@@@@1@7@@danf@17-8-2009 10140490@unknown@formal@none@1@S@''Informatik'' was a term used in Europe with more frequency.@@@@1@10@@danf@17-8-2009 10140500@unknown@formal@none@1@S@The renowned computer scientist [[Edsger W. Dijkstra|Edsger Dijkstra]] stated, "Computer science is no more about computers than astronomy is about telescopes."@@@@1@21@@danf@17-8-2009 10140510@unknown@formal@none@1@S@The design and deployment of computers and computer systems is generally considered the province of disciplines other than computer science.@@@@1@20@@danf@17-8-2009 10140520@unknown@formal@none@1@S@For example, the study of [[computer hardware]] is usually considered part of [[computer engineering]], while the study of commercial [[computer system]]s and their deployment is often called [[information technology]] or [[information systems]].@@@@1@32@@danf@17-8-2009 10140530@unknown@formal@none@1@S@Computer science is sometimes criticized as being insufficiently scientific, a view espoused in the statement "Science is to computer science as hydrodynamics is to plumbing", credited to [[Stan Kelly-Bootle]] and others.@@@@1@31@@danf@17-8-2009 10140540@unknown@formal@none@1@S@However, there has been much cross-fertilization of ideas between the various computer-related disciplines.@@@@1@13@@danf@17-8-2009 10140550@unknown@formal@none@1@S@Computer science research has also often crossed into other disciplines, such as [[cognitive science]], [[economics]], [[mathematics]], [[physics]] (see [[quantum computing]]), and [[linguistics]].@@@@1@22@@danf@17-8-2009 10140560@unknown@formal@none@1@S@Computer science is considered by some to have a much closer relationship with [[mathematics]] than many scientific disciplines.@@@@1@18@@danf@17-8-2009 10140570@unknown@formal@none@1@S@Early computer science was strongly influenced by the work of mathematicians such as [[Kurt Gödel]] and [[Alan Turing]], and there continues to be a useful interchange of ideas between the two fields in areas such as [[mathematical logic]], [[category theory]], [[domain theory]], and [[algebra]].@@@@1@44@@danf@17-8-2009 10140580@unknown@formal@none@1@S@The relationship between computer science and [[software engineering]] is a contentious issue, which is further muddied by [[Debates within software engineering|disputes]] over what the term "software engineering" means, and how computer science is defined.@@@@1@34@@danf@17-8-2009 10140590@unknown@formal@none@1@S@[[David Parnas]], taking a cue from the relationship between other engineering and science disciplines, has claimed that the principal focus of computer science is studying the properties of computation in general, while the principal focus of software engineering is the design of specific computations to achieve practical goals, making the two separate but complementary disciplines.@@@@1@55@@danf@17-8-2009 10140600@unknown@formal@none@1@S@The academic, political, and funding aspects of computer science tend to have roots as to whether a department in the U.S. formed with either a mathematical emphasis or an engineering emphasis.@@@@1@31@@danf@17-8-2009 10140610@unknown@formal@none@1@S@In general, electrical engineering-based computer science departments have tended to succeed as computer science and/or engineering departments.@@@@1@17@@danf@17-8-2009 10140620@unknown@formal@none@1@S@Computer science departments with a mathematics emphasis and with a numerical orientation consider alignment [[computational science]].@@@@1@16@@danf@17-8-2009 10140630@unknown@formal@none@1@S@Both types of departments tend to make efforts to bridge the field educationally if not across all research.@@@@1@18@@danf@17-8-2009 10140640@unknown@formal@none@1@S@== Fields of computer science ==@@@@1@6@@danf@17-8-2009 10140650@unknown@formal@none@1@S@Computer science searches for concepts and [[formal proof]]s to explain and describe computational systems of interest.@@@@1@16@@danf@17-8-2009 10140660@unknown@formal@none@1@S@As with all sciences, these theories can then be utilised to synthesize practical engineering applications, which in turn may suggest new systems to be studied and analysed.@@@@1@27@@danf@17-8-2009 10140670@unknown@formal@none@1@S@While the [[ACM Computing Classification System]] can be used to split computer science up into different topics of fields, a more descriptive breakdown follows:@@@@1@24@@danf@17-8-2009 10140680@unknown@formal@none@1@S@=== Mathematical foundations ===@@@@1@4@@danf@17-8-2009 10140690@unknown@formal@none@1@S@; [[Mathematical logic]]@@@@1@3@@danf@17-8-2009 10140700@unknown@formal@none@1@S@: Boolean logic and other ways of modeling logical queries; the uses and limitations of formal proof methods.@@@@1@18@@danf@17-8-2009 10140710@unknown@formal@none@1@S@; [[Number theory]]@@@@1@3@@danf@17-8-2009 10140720@unknown@formal@none@1@S@: Theory of proofs and heuristics for finding proofs in the simple domain of integers.@@@@1@15@@danf@17-8-2009 10140730@unknown@formal@none@1@S@Used in [[cryptography]] as well as a test domain in [[artificial intelligence]].@@@@1@12@@danf@17-8-2009 10140740@unknown@formal@none@1@S@; [[Graph theory]]@@@@1@3@@danf@17-8-2009 10140750@unknown@formal@none@1@S@: Foundations for data structures and searching algorithms.@@@@1@8@@danf@17-8-2009 10140760@unknown@formal@none@1@S@; [[Type theory]]@@@@1@3@@danf@17-8-2009 10140770@unknown@formal@none@1@S@: Formal analysis of the types of data, and the use of these types to understand properties of programs, especially program safety.@@@@1@22@@danf@17-8-2009 10140780@unknown@formal@none@1@S@; [[Category theory]]@@@@1@3@@danf@17-8-2009 10140790@unknown@formal@none@1@S@: Category theory provides a means of capturing all of math and computation in a single synthesis.@@@@1@17@@danf@17-8-2009 10140800@unknown@formal@none@1@S@; [[Computational geometry]]@@@@1@3@@danf@17-8-2009 10140810@unknown@formal@none@1@S@: The study of [[algorithm]]s to solve problems stated in terms of [[geometry]].@@@@1@13@@danf@17-8-2009 10140820@unknown@formal@none@1@S@; [[Numerical analysis]]@@@@1@3@@danf@17-8-2009 10140830@unknown@formal@none@1@S@: Foundations for algorithms in discrete mathematics, as well as the study of the limitations of floating point computation, including [[round-off]] errors.@@@@1@22@@danf@17-8-2009 10140840@unknown@formal@none@1@S@=== Theory of computation ===@@@@1@5@@danf@17-8-2009 10140850@unknown@formal@none@1@S@; [[Automata theory]]@@@@1@3@@danf@17-8-2009 10140860@unknown@formal@none@1@S@: Different logical structures for solving problems.@@@@1@7@@danf@17-8-2009 10140870@unknown@formal@none@1@S@; [[Computability theory (computer science)|Computability theory]]@@@@1@6@@danf@17-8-2009 10140880@unknown@formal@none@1@S@: What is calculable with the current models of computers.@@@@1@10@@danf@17-8-2009 10140890@unknown@formal@none@1@S@Proofs developed by [[Alan Turing]] and others provide insight into the possibilities of what can be computed and what cannot.@@@@1@20@@danf@17-8-2009 10140900@unknown@formal@none@1@S@; [[Computational complexity theory]]@@@@1@4@@danf@17-8-2009 10140910@unknown@formal@none@1@S@: Fundamental bounds (especially time and storage space) on classes of computations; in practice, study of which problems a computer can solve with reasonable resources (while computability theory studies which problems can be solved at all).@@@@1@36@@danf@17-8-2009 10140920@unknown@formal@none@1@S@; [[Quantum computing|Quantum computing theory]]@@@@1@5@@danf@17-8-2009 10140930@unknown@formal@none@1@S@: Representation and manipulation of data using the quantum properties of particles and quantum mechanism.@@@@1@15@@danf@17-8-2009 10140940@unknown@formal@none@1@S@=== Algorithms and data structures ===@@@@1@6@@danf@17-8-2009 10140950@unknown@formal@none@1@S@; [[Analysis of algorithms]]@@@@1@4@@danf@17-8-2009 10140960@unknown@formal@none@1@S@: Time and space complexity of algorithms.@@@@1@7@@danf@17-8-2009 10140970@unknown@formal@none@1@S@; [[Algorithms]]@@@@1@2@@danf@17-8-2009 10140980@unknown@formal@none@1@S@: Formal logical processes used for computation, and the efficiency of these processes.@@@@1@13@@danf@17-8-2009 10140990@unknown@formal@none@1@S@=== Programming languages and compilers ===@@@@1@6@@danf@17-8-2009 10141000@unknown@formal@none@1@S@; [[Compiler]]s@@@@1@2@@danf@17-8-2009 10141010@unknown@formal@none@1@S@: Ways of translating computer programs, usually from [[high-level programming language|higher level]] languages to [[low-level programming language|lower level]] ones.@@@@1@19@@danf@17-8-2009 10141020@unknown@formal@none@1@S@; [[Interpreter (computing)|Interpreter]]s@@@@1@3@@danf@17-8-2009 10141030@unknown@formal@none@1@S@: A program that takes in as input a computer program and executes it.@@@@1@14@@danf@17-8-2009 10141040@unknown@formal@none@1@S@; [[Programming language]]s@@@@1@3@@danf@17-8-2009 10141050@unknown@formal@none@1@S@: Formal language paradigms for expressing algorithms, and the properties of these languages (e.g., what problems they are suited to solve).@@@@1@21@@danf@17-8-2009 10141060@unknown@formal@none@1@S@=== Concurrent, parallel, and distributed systems ===@@@@1@7@@danf@17-8-2009 10141070@unknown@formal@none@1@S@; [[Concurrency (computer science)|Concurrency]]@@@@1@4@@danf@17-8-2009 10141080@unknown@formal@none@1@S@: The theory and practice of simultaneous computation; data safety in any multitasking or multithreaded environment.@@@@1@16@@danf@17-8-2009 10141090@unknown@formal@none@1@S@; [[Distributed computing]]@@@@1@3@@danf@17-8-2009 10141100@unknown@formal@none@1@S@: Computing using multiple computing devices over a network to accomplish a common objective or task and thereby reducing the latency involved in single processor contributions for any task.@@@@1@29@@danf@17-8-2009 10141110@unknown@formal@none@1@S@; [[Parallel computing]]@@@@1@3@@danf@17-8-2009 10141120@unknown@formal@none@1@S@: Computing using multiple concurrent threads of execution.@@@@1@8@@danf@17-8-2009 10141130@unknown@formal@none@1@S@=== Software engineering ===@@@@1@4@@danf@17-8-2009 10141140@unknown@formal@none@1@S@; [[Algorithm design]]@@@@1@3@@danf@17-8-2009 10141150@unknown@formal@none@1@S@: Using ideas from algorithm theory to creatively design solutions to real tasks@@@@1@13@@danf@17-8-2009 10141160@unknown@formal@none@1@S@; [[Computer programming]]@@@@1@3@@danf@17-8-2009 10141170@unknown@formal@none@1@S@: The practice of using a programming language to implement algorithms@@@@1@11@@danf@17-8-2009 10141180@unknown@formal@none@1@S@; [[Formal methods]]@@@@1@3@@danf@17-8-2009 10141190@unknown@formal@none@1@S@: Mathematical approaches for describing and reasoning about software designs.@@@@1@10@@danf@17-8-2009 10141200@unknown@formal@none@1@S@; [[Reverse engineering]]@@@@1@3@@danf@17-8-2009 10141210@unknown@formal@none@1@S@: The application of the scientific method to the understanding of arbitrary existing software@@@@1@14@@danf@17-8-2009 10141220@unknown@formal@none@1@S@; [[Software development]]@@@@1@3@@danf@17-8-2009 10141230@unknown@formal@none@1@S@: The principles and practice of designing, developing, and testing programs, as well as proper engineering practices.@@@@1@17@@danf@17-8-2009 10141240@unknown@formal@none@1@S@=== System architecture ===@@@@1@4@@danf@17-8-2009 10141250@unknown@formal@none@1@S@; [[Computer architecture]]@@@@1@3@@danf@17-8-2009 10141260@unknown@formal@none@1@S@: The design, organization, optimization and verification of a computer system, mostly about [[CPU]]s and [[memory (computers)|memory]] subsystems (and the bus connecting them).@@@@1@23@@danf@17-8-2009 10141270@unknown@formal@none@1@S@; [[Computer organization]]@@@@1@3@@danf@17-8-2009 10141280@unknown@formal@none@1@S@: The implementation of computer architectures, in terms of descriptions of their specific [[electrical circuit]]ry@@@@1@15@@danf@17-8-2009 10141290@unknown@formal@none@1@S@; [[Operating system]]s@@@@1@3@@danf@17-8-2009 10141300@unknown@formal@none@1@S@: Systems for managing computer programs and providing the basis of a useable system.@@@@1@14@@danf@17-8-2009 10141310@unknown@formal@none@1@S@=== Communications ===@@@@1@3@@danf@17-8-2009 10141320@unknown@formal@none@1@S@; [[Computer audio]]@@@@1@3@@danf@17-8-2009 10141330@unknown@formal@none@1@S@: Algorithms and data structures for the creation, manipulation, storage, and transmission of [[digital audio]] recordings.@@@@1@16@@danf@17-8-2009 10141340@unknown@formal@none@1@S@Also important in [[voice recognition]] applications.@@@@1@6@@danf@17-8-2009 10141350@unknown@formal@none@1@S@; [[Computer networking|Networking]]@@@@1@3@@danf@17-8-2009 10141360@unknown@formal@none@1@S@: Algorithms and protocols for communicating data across different shared or dedicated media, often including [[error correction]].@@@@1@17@@danf@17-8-2009 10141370@unknown@formal@none@1@S@; [[Cryptography]]@@@@1@2@@danf@17-8-2009 10141380@unknown@formal@none@1@S@: Applies results from complexity, probability and number theory to invent and break codes.@@@@1@14@@danf@17-8-2009 10141390@unknown@formal@none@1@S@=== Databases ===@@@@1@3@@danf@17-8-2009 10141400@unknown@formal@none@1@S@; [[Data mining]]@@@@1@3@@danf@17-8-2009 10141410@unknown@formal@none@1@S@: Data mining is the extraction of relevant data from all sources of data.@@@@1@14@@danf@17-8-2009 10141420@unknown@formal@none@1@S@; [[Relational databases]]@@@@1@3@@danf@17-8-2009 10141430@unknown@formal@none@1@S@: Study of algorithms for searching and processing information in documents and databases; closely related to [[information retrieval]].@@@@1@18@@danf@17-8-2009 10141440@unknown@formal@none@1@S@; [[OLAP]]@@@@1@2@@danf@17-8-2009 10141450@unknown@formal@none@1@S@: Online Analytical Processing, or OLAP, is an approach to quickly provide answers to analytical queries that are multi-dimensional in nature.@@@@1@21@@danf@17-8-2009 10141460@unknown@formal@none@1@S@OLAP is part of the broader category [[business intelligence]], which also encompasses relational reporting and data mining.@@@@1@17@@danf@17-8-2009 10141470@unknown@formal@none@1@S@=== Artificial intelligence ===@@@@1@4@@danf@17-8-2009 10141480@unknown@formal@none@1@S@; [[Artificial intelligence]]@@@@1@3@@danf@17-8-2009 10141490@unknown@formal@none@1@S@: The implementation and study of systems that exhibit an autonomous intelligence or behaviour of their own.@@@@1@17@@danf@17-8-2009 10141500@unknown@formal@none@1@S@; [[Artificial life]]@@@@1@3@@danf@17-8-2009 10141510@unknown@formal@none@1@S@: The study of digital organisms to learn about biological systems and evolution.@@@@1@13@@danf@17-8-2009 10141520@unknown@formal@none@1@S@; [[Automated reasoning]]@@@@1@3@@danf@17-8-2009 10141530@unknown@formal@none@1@S@: Solving engines, such as used in [[Prolog]], which produce steps to a result given a query on a fact and rule database.@@@@1@23@@danf@17-8-2009 10141540@unknown@formal@none@1@S@; [[Computer vision]]@@@@1@3@@danf@17-8-2009 10141550@unknown@formal@none@1@S@: Algorithms for identifying three dimensional objects from one or more two dimensional pictures.@@@@1@14@@danf@17-8-2009 10141560@unknown@formal@none@1@S@; [[Machine learning]]@@@@1@3@@danf@17-8-2009 10141570@unknown@formal@none@1@S@: Automated creation of a set of rules and axioms based on input.@@@@1@13@@danf@17-8-2009 10141580@unknown@formal@none@1@S@; [[Natural language processing]]/[[Computational linguistics]]@@@@1@5@@danf@17-8-2009 10141590@unknown@formal@none@1@S@: Automated understanding and generation of human language@@@@1@8@@danf@17-8-2009 10141600@unknown@formal@none@1@S@; [[Robotics]]@@@@1@2@@danf@17-8-2009 10141610@unknown@formal@none@1@S@: Algorithms for controlling the behavior of robots.@@@@1@8@@danf@17-8-2009 10141620@unknown@formal@none@1@S@=== Visual rendering (or Computer graphics) ===@@@@1@7@@danf@17-8-2009 10141630@unknown@formal@none@1@S@; [[Computer graphics]]@@@@1@3@@danf@17-8-2009 10141640@unknown@formal@none@1@S@: Algorithms both for generating visual images synthetically, and for integrating or altering visual and spatial information sampled from the real world.@@@@1@22@@danf@17-8-2009 10141650@unknown@formal@none@1@S@; [[Image processing]]@@@@1@3@@danf@17-8-2009 10141660@unknown@formal@none@1@S@: Determining information from an image through computation.@@@@1@8@@danf@17-8-2009 10141670@unknown@formal@none@1@S@=== Human-Computer Interaction ===@@@@1@4@@danf@17-8-2009 10141680@unknown@formal@none@1@S@; [[Human computer interaction]]@@@@1@4@@danf@17-8-2009 10141690@unknown@formal@none@1@S@: The study of making computers and computations useful, usable and universally accessible to [[user (computing)|people]], including the study and design of computer interfaces through which people use computers.@@@@1@29@@danf@17-8-2009 10141700@unknown@formal@none@1@S@=== Scientific computing ===@@@@1@4@@danf@17-8-2009 10141710@unknown@formal@none@1@S@; [[Bioinformatics]]@@@@1@2@@danf@17-8-2009 10141720@unknown@formal@none@1@S@: The use of computer science to maintain, analyse, and store [[biological data]], and to assist in solving biological problems such as [[protein folding]], function prediction and [[phylogeny]].@@@@1@28@@danf@17-8-2009 10141730@unknown@formal@none@1@S@; [[Cognitive Science]]@@@@1@3@@danf@17-8-2009 10141740@unknown@formal@none@1@S@: Computational modelling of real minds@@@@1@6@@danf@17-8-2009 10141750@unknown@formal@none@1@S@; [[Computational chemistry]]@@@@1@3@@danf@17-8-2009 10141760@unknown@formal@none@1@S@: Computational modelling of theoretical chemistry in order to determine chemical structures and properties@@@@1@14@@danf@17-8-2009 10141770@unknown@formal@none@1@S@; [[Computational neuroscience]]@@@@1@3@@danf@17-8-2009 10141780@unknown@formal@none@1@S@: Computational modelling of real brains@@@@1@6@@danf@17-8-2009 10141790@unknown@formal@none@1@S@; [[Computational physics]]@@@@1@3@@danf@17-8-2009 10141800@unknown@formal@none@1@S@: Numerical simulations of large non-analytic systems@@@@1@7@@danf@17-8-2009 10141810@unknown@formal@none@1@S@; [[Numerical analysis|Numerical algorithms]]@@@@1@4@@danf@17-8-2009 10141820@unknown@formal@none@1@S@: Algorithms for the numerical solution of mathematical problems such as [[Root-finding algorithm|root-finding]], [[Numerical integration|integration]], the [[Numerical ordinary differential equations|solution of ordinary differential equations]] and the approximation/evaluation of [[special functions]].@@@@1@30@@danf@17-8-2009 10141830@unknown@formal@none@1@S@; [[Symbolic mathematics]]@@@@1@3@@danf@17-8-2009 10141840@unknown@formal@none@1@S@: Manipulation and solution of expressions in symbolic form, also known as [[Computer algebra]].@@@@1@14@@danf@17-8-2009 10141850@unknown@formal@none@1@S@=== Didactics of computer science/informatics ===@@@@1@6@@danf@17-8-2009 10141860@unknown@formal@none@1@S@The subfield didactics of computer science focuses on cognitive approaches of developing competencies of computer science and specific strategies for analysis, design, implementation and evaluation of excellent lessons in computer science.@@@@1@31@@danf@17-8-2009 10141870@unknown@formal@none@1@S@== Computer science education ==@@@@1@5@@danf@17-8-2009 10141880@unknown@formal@none@1@S@Some universities teach computer science as a theoretical study of computation and algorithmic reasoning.@@@@1@14@@danf@17-8-2009 10141890@unknown@formal@none@1@S@These programs often feature the [[theory of computation]], [[analysis of algorithms]], [[formal methods]], [[Concurrency (computer science)|concurrency theory]], [[databases]], [[computer graphics]] and [[systems analysis]], among others.@@@@1@25@@danf@17-8-2009 10141900@unknown@formal@none@1@S@They typically also teach [[computer programming]], but treat it as a vessel for the support of other fields of computer science rather than a central focus of high-level study.@@@@1@29@@danf@17-8-2009 10141910@unknown@formal@none@1@S@Other colleges and universities, as well as [[secondary school]]s and vocational programs that teach computer science, emphasize the practice of advanced [[computer programming]] rather than the theory of algorithms and computation in their computer science curricula.@@@@1@36@@danf@17-8-2009 10141920@unknown@formal@none@1@S@Such curricula tend to focus on those skills that are important to workers entering the software industry.@@@@1@17@@danf@17-8-2009 10141930@unknown@formal@none@1@S@The practical aspects of computer programming are often referred to as [[software engineering]].@@@@1@13@@danf@17-8-2009 10141940@unknown@formal@none@1@S@However, there is a lot of [[Debates within software engineering|disagreement]] over what the term "software engineering" actually means, and whether it is the same thing as programming.@@@@1@27@@danf@17-8-2009 10150010@unknown@formal@none@1@S@
Corpus linguistics
@@@@1@2@@danf@17-8-2009 10150020@unknown@formal@none@1@S@'''Corpus linguistics''' is the [[study of language]] as expressed in [[sample]]s ''([[Text corpus|corpora]])'' or "real world" text.@@@@1@17@@danf@17-8-2009 10150030@unknown@formal@none@1@S@This method represents a [[digest]]ive approach to deriving a set of abstract rules by which a [[natural language]] is governed or else relates to another language.@@@@1@26@@danf@17-8-2009 10150040@unknown@formal@none@1@S@Originally done by hand, corpora are largely derived by an automated process, which is corrected.@@@@1@15@@danf@17-8-2009 10150050@unknown@formal@none@1@S@Computational methods had once been viewed as a [[holy grail]] of [[linguistics|linguistic]] research, which would ultimately manifest a [[ruleset]] for [[natural language processing]] and [[machine translation]] at a high level.@@@@1@30@@danf@17-8-2009 10150060@unknown@formal@none@1@S@Such has not been the case, and since the [[cognitive revolution]], cognitive linguistics has been largely critical of many claimed practical uses for corpora.@@@@1@24@@danf@17-8-2009 10150070@unknown@formal@none@1@S@However, as [[computation]] capacity and speed have increased, the use of corpora to study language and term relationships en masse has gained some respectability.@@@@1@24@@danf@17-8-2009 10150080@unknown@formal@none@1@S@The corpus approach runs counter to [[Noam Chomsky]]'s view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting.@@@@1@32@@danf@17-8-2009 10150090@unknown@formal@none@1@S@Corpus linguistics does away with Chomsky's ''competence/performance'' split; adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference.@@@@1@27@@danf@17-8-2009 10150100@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10150110@unknown@formal@none@1@S@A landmark in modern corpus linguistics was the publication by [[Henry Kucera]] and [[Nelson Francis]] of ''Computational Analysis of Present-Day American English'' in 1967, a work based on the analysis of the [[Brown Corpus]], a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources.@@@@1@54@@danf@17-8-2009 10150120@unknown@formal@none@1@S@Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, language teaching, [[psychology]], [[statistics]], and [[sociology]].@@@@1@30@@danf@17-8-2009 10150130@unknown@formal@none@1@S@A further key publication was [[Randolph Quirk]]'s 'Towards a description of English Usage' (1960, Transactions of the Philological Society, 40-61) in which he introduced ''The Survey of English Usage''.@@@@1@29@@danf@17-8-2009 10150140@unknown@formal@none@1@S@Shortly thereafter, Boston publisher [[Houghton-Mifflin]] approached Kucera to supply a million word, three-line citation base for its new ''[[The American Heritage Dictionary of the English Language|American Heritage Dictionary]]'', the first [[dictionary]] to be compiled using corpus linguistics.@@@@1@37@@danf@17-8-2009 10150150@unknown@formal@none@1@S@The AHD made the innovative step of combining prescriptive elements (how language ''should'' be used) with descriptive information (how it actually ''is'' used).@@@@1@23@@danf@17-8-2009 10150160@unknown@formal@none@1@S@Other publishers followed suit.@@@@1@4@@danf@17-8-2009 10150170@unknown@formal@none@1@S@The British publisher Collins' [[COBUILD]] [[monolingual learner's dictionary]], designed for users learning [[English language learning and teaching|English as a foreign language]], was compiled using the [[Bank of English]].@@@@1@28@@danf@17-8-2009 10150180@unknown@formal@none@1@S@The [[Brown Corpus]] has also spawned a number of similarly structured corpora: the [[LOB Corpus]] (1960s [[British English]]), Kolhapur ([[Indian English]]), Wellington ([[New Zealand English]]), Australian Corpus of English ([[Australian English]]), the Frown Corpus ([[early 1990s]] [[American English]]), and the FLOB Corpus (1990s British English).@@@@1@45@@danf@17-8-2009 10150190@unknown@formal@none@1@S@Other corpora represent many languages, varieties and modes, and include the [[International Corpus of English]], and the [[British National Corpus]], a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities ([[Oxford University|Oxford]] and [[Lancaster University|Lancaster]]) and the [[British Library]].@@@@1@52@@danf@17-8-2009 10150200@unknown@formal@none@1@S@For contemporary American English, work has stalled on the [[American National Corpus]], but the 360 million word [[Corpus of Contemporary American English (COCA)]] (1990-present) is now available.@@@@1@27@@danf@17-8-2009 10150210@unknown@formal@none@1@S@== Methods ==@@@@1@3@@danf@17-8-2009 10150220@unknown@formal@none@1@S@This means dealing with real input data, where descriptions based on a linguist's intuition are not usually helpful.@@@@1@18@@danf@17-8-2009 10160010@unknown@formal@none@1@S@
Cross-platform
@@@@1@1@@danf@17-8-2009 10160020@unknown@formal@none@1@S@'''Cross-platform''' (also known as '''multi-platform''') is a term used in computing to refer to [[computer program]]s, [[operating system]]s, [[computer language]]s, [[programming language]]s, or other [[computer software]] and their implementations which can be made to work on multiple [[computer platform]]s.@@@@1@39@@danf@17-8-2009 10160030@unknown@formal@none@1@S@“Cross-platform” and “multi-platform” both refer to the idea that a given piece of computer software is able to be run on more than one computer platform.@@@@1@26@@danf@17-8-2009 10160040@unknown@formal@none@1@S@There are two major types of cross-platform software; one requires building for each platform that it supports (e.g., is written in a compiled language, such as [[Pascal (programming language)|Pascal]]), and the other one can be directly run on any platform which supports it (e.g., software written in an [[interpreted language]] such as [[Perl]], [[Python (programming language)|Python]], or [[shell script]]) or software written in a language which compiles to [[bytecode]] and the bytecode is redistributed (such as is the case with [[Java (programming language)|Java]] and languages used in the [[.NET Framework]]) such as [[Chrome (programming language)|Chrome]].@@@@1@95@@danf@17-8-2009 10160050@unknown@formal@none@1@S@For example, a cross-platform [[application software|application]] may run on [[Microsoft Windows]] on the [[x86 architecture]], [[Linux]] on the [[x86 architecture]] and [[Mac OS X]] on either the [[PowerPC]] or [[x86]] based [[Apple Macintosh]] systems.@@@@1@34@@danf@17-8-2009 10160060@unknown@formal@none@1@S@A cross-platform [[application software|application]] may run on as many as all existing platforms, or on as few as two platforms.@@@@1@20@@danf@17-8-2009 10160070@unknown@formal@none@1@S@== Platforms ==@@@@1@3@@danf@17-8-2009 10160080@unknown@formal@none@1@S@A platform is a combination of hardware and software used to run software applications.@@@@1@14@@danf@17-8-2009 10160090@unknown@formal@none@1@S@A platform can be described simply as an operating system or computer architecture, or it could be the combination of both.@@@@1@21@@danf@17-8-2009 10160100@unknown@formal@none@1@S@Probably the most familiar platform is [[Microsoft Windows]] running on the [[x86 architecture]].@@@@1@13@@danf@17-8-2009 10160110@unknown@formal@none@1@S@Other well-known desktop computer platforms include [[Linux]] and [[Mac OS X]] (both of which are themselves cross-platform).@@@@1@17@@danf@17-8-2009 10160120@unknown@formal@none@1@S@There are, however, many devices such as [[cellular telephones]] that are also effectively computer platforms but less commonly thought about in that way.@@@@1@23@@danf@17-8-2009 10160130@unknown@formal@none@1@S@[[Application software]] can be written to depend on the features of a particular platform—either the hardware, operating system, or virtual machine it runs on.@@@@1@24@@danf@17-8-2009 10160140@unknown@formal@none@1@S@The [[Java Platform|Java platform]] is a [[virtual machine]] platform which runs on many operating systems and hardware types, and is a common platform for software to be written for.@@@@1@29@@danf@17-8-2009 10160150@unknown@formal@none@1@S@=== Hardware platforms ===@@@@1@4@@danf@17-8-2009 10160160@unknown@formal@none@1@S@A '''hardware platform''' can refer to a computer’s [[computer architecture|architecture]] or [[processor architecture]].@@@@1@13@@danf@17-8-2009 10160170@unknown@formal@none@1@S@For example, the [[x86]] and [[x86-64]] [[CPU]]s make up one of the most common [[computer architecture]]s in use in home machines today.@@@@1@22@@danf@17-8-2009 10160180@unknown@formal@none@1@S@These machines commonly run [[Microsoft Windows]], though they can run other [[operating system]]s as well, including [[Linux]], [[OpenBSD]], [[NetBSD]], [[Mac OS X]] and [[FreeBSD]].@@@@1@24@@danf@17-8-2009 10160190@unknown@formal@none@1@S@=== Software platforms ===@@@@1@4@@danf@17-8-2009 10160200@unknown@formal@none@1@S@Software platforms can either be an [[operating system]] or programming environment, though more commonly it is a combination of both.@@@@1@20@@danf@17-8-2009 10160210@unknown@formal@none@1@S@A notable exception to this is [[Java (programming language)|Java]], which uses an [[operating system]] independent [[virtual machine]] for its [[compiled]] code, known in the world of Java as [[bytecode]].@@@@1@29@@danf@17-8-2009 10160220@unknown@formal@none@1@S@Examples of software platforms include:@@@@1@5@@danf@17-8-2009 10160230@unknown@formal@none@1@S@* [[MS-DOS]] ([[x86]]), [[DR-DOS]] ([[x86]]), [[FreeDOS]] ([[x86]]) etc.@@@@1@8@@danf@17-8-2009 10160240@unknown@formal@none@1@S@* [[Microsoft Windows]] ([[x86]], [[x64]])@@@@1@5@@danf@17-8-2009 10160250@unknown@formal@none@1@S@* [[Linux]] (x86, x64, [[PowerPC]], various other architectures)@@@@1@8@@danf@17-8-2009 10160260@unknown@formal@none@1@S@* [[Mac OS X]] (PowerPC, x86)@@@@1@6@@danf@17-8-2009 10160270@unknown@formal@none@1@S@* [[OS/2]], [[eComStation]]@@@@1@3@@danf@17-8-2009 10160280@unknown@formal@none@1@S@* [[AmigaOS]] ([[m68k]]), [[AROS]] (x86, PowerPC, m68k), [[MorphOS]] (PowerPC)@@@@1@9@@danf@17-8-2009 10160290@unknown@formal@none@1@S@* [[Java (programming language)|Java]]@@@@1@4@@danf@17-8-2009 10160300@unknown@formal@none@1@S@==== Java platform ====@@@@1@4@@danf@17-8-2009 10160310@unknown@formal@none@1@S@As previously noted, the [[Java platform]] is an exception to the general rule that an [[operating system]] is a software platform.@@@@1@21@@danf@17-8-2009 10160320@unknown@formal@none@1@S@The Java language provides a [[virtual machine]], or a “virtual CPU” which runs all of the code that is written for the language.@@@@1@23@@danf@17-8-2009 10160330@unknown@formal@none@1@S@This enables the same [[executable]] [[binary file|binary]] to run on all systems which support the Java software, through the [[Java Virtual Machine]].@@@@1@22@@danf@17-8-2009 10160340@unknown@formal@none@1@S@Java [[executable]]s do not run directly on the [[operating system]]; that is, neither [[Microsoft Windows|Windows]] nor [[Linux]] execute Java programs directly.@@@@1@21@@danf@17-8-2009 10160350@unknown@formal@none@1@S@Because of this, however, Java is limited in that it does not directly support system-specific functionality.@@@@1@16@@danf@17-8-2009 10160360@unknown@formal@none@1@S@[[Java Native Interface|JNI]] can be used to access system specific functions, but then the code is likely no longer portable.@@@@1@20@@danf@17-8-2009 10160370@unknown@formal@none@1@S@Java programs can run on at least the [[Microsoft Windows]], [[Mac OS X]], [[Linux]], and [[Solaris Operating System|Solaris]] operating systems, and so the language is limited to functionality that exists on all these systems.@@@@1@34@@danf@17-8-2009 10160380@unknown@formal@none@1@S@This includes things such as [[computer networking]], [[Internet socket]]s, but not necessarily raw hardware [[input/output]].@@@@1@15@@danf@17-8-2009 10160390@unknown@formal@none@1@S@== Cross-platform software ==@@@@1@4@@danf@17-8-2009 10160400@unknown@formal@none@1@S@In order for software to be considered '''cross-platform''', it must be able to function on more than one [[computer architecture]] or [[operating system]].@@@@1@23@@danf@17-8-2009 10160410@unknown@formal@none@1@S@This can be a time-consuming task given that different [[operating system]]s have different [[application programming interface]]s or [[application programming interface|API]]s (for example, [[Linux]] uses a different [[application programming interface|API]] for [[application software]] than [[Microsoft Windows|Windows]] does).@@@@1@36@@danf@17-8-2009 10160420@unknown@formal@none@1@S@Just because a particular [[operating system]] may run on different [[computer architecture]]s, that does not mean that the software written for that operating system will automatically work on all [[computer architecture|architecture]]s that the operating system supports.@@@@1@36@@danf@17-8-2009 10160430@unknown@formal@none@1@S@One example as of August, 2006 was [[OpenOffice.org]], which did not natively run on the [[AMD64]] or [[EM64T]] lines of processors implementing the [[x86-64]] [[64-bit]] standards for computers; this has since been changed, and the OpenOffice.org suite of software is “mostly” ported to these 64-bit systems[http://wiki.services.openoffice.org/wiki/Porting_to_x86-64_(AMD64,_EM64T)].@@@@1@46@@danf@17-8-2009 10160440@unknown@formal@none@1@S@This also means that just because a program is written in a popular programming language such as [[C (programming language)|C]] or [[C++]], it does not mean it will run on all [[operating systems]] that support that [[programming language]].@@@@1@38@@danf@17-8-2009 10160450@unknown@formal@none@1@S@=== Web applications ===@@@@1@4@@danf@17-8-2009 10160460@unknown@formal@none@1@S@[[Web application]]s are typically described as cross-platform because, ideally, they are accessible from any of various [[web browser]]s within different operating systems.@@@@1@22@@danf@17-8-2009 10160470@unknown@formal@none@1@S@Such applications generally employ a [[client-server]] system architecture, and vary widely in complexity and functionality.@@@@1@15@@danf@17-8-2009 10160480@unknown@formal@none@1@S@This wide variability significantly complicates the goal of cross-platform capability, which is routinely at odds with the goal of advanced functionality.@@@@1@21@@danf@17-8-2009 10160490@unknown@formal@none@1@S@==== Basic applications ====@@@@1@4@@danf@17-8-2009 10160500@unknown@formal@none@1@S@Basic web applications perform all or most processing from a [[Stateless server|stateless]] [[web server]], and pass the result to the client web browser.@@@@1@23@@danf@17-8-2009 10160510@unknown@formal@none@1@S@All user interaction with the application consists of simple exchanges of data requests and server responses.@@@@1@16@@danf@17-8-2009 10160520@unknown@formal@none@1@S@These types of applications were the norm in the early phases of [[World Wide Web]] application development.@@@@1@17@@danf@17-8-2009 10160530@unknown@formal@none@1@S@Such applications follow a simple [[Transaction processing|transaction]] model, identical to that of serving [[static web page]]s.@@@@1@16@@danf@17-8-2009 10160540@unknown@formal@none@1@S@Today, they are still relatively common, especially where cross-platform compatibility and simplicity are deemed more critical than advanced functionality.@@@@1@19@@danf@17-8-2009 10160550@unknown@formal@none@1@S@==== Advanced applications ====@@@@1@4@@danf@17-8-2009 10160560@unknown@formal@none@1@S@Prominent examples of advanced web applications include the Web interface to [[Gmail]], [[A9.com]], and the maps.live.com section of [[Live Search]].@@@@1@20@@danf@17-8-2009 10160570@unknown@formal@none@1@S@Such advanced applications routinely depend on additional features found only in the more recent versions of popular web browsers.@@@@1@19@@danf@17-8-2009 10160580@unknown@formal@none@1@S@These dependencies include [[Ajax (programming)|Ajax]], [[JavaScript]], [[Dynamic HTML|“Dynamic” HTML]], [[SVG]], and other components of [[rich internet application]]s.@@@@1@17@@danf@17-8-2009 10160590@unknown@formal@none@1@S@Older versions of popular browsers tend to lack support for certain features.@@@@1@12@@danf@17-8-2009 10160600@unknown@formal@none@1@S@==== Design strategies ====@@@@1@4@@danf@17-8-2009 10160610@unknown@formal@none@1@S@Because of the competing interests of cross-platform compatibility and advanced functionality, numerous alternative web application design strategies have emerged.@@@@1@19@@danf@17-8-2009 10160620@unknown@formal@none@1@S@Such strategies include:@@@@1@3@@danf@17-8-2009 10160630@unknown@formal@none@1@S@=====Graceful degradation=====@@@@1@2@@danf@17-8-2009 10160640@unknown@formal@none@1@S@Graceful degradation attempts to provide the same or similar functionality to all users and platforms, while diminishing that functionality to a ‘least common denominator’ for more limited client browsers.@@@@1@29@@danf@17-8-2009 10160650@unknown@formal@none@1@S@For example, a user attempting to use a limited-feature browser to access Gmail may notice that Gmail switches to “Basic Mode,” with reduced functionality.@@@@1@24@@danf@17-8-2009 10160660@unknown@formal@none@1@S@Some view this strategy as a lesser form of cross-platform capability.@@@@1@11@@danf@17-8-2009 10160670@unknown@formal@none@1@S@=====Separation of functionality=====@@@@1@3@@danf@17-8-2009 10160680@unknown@formal@none@1@S@Separation of functionality attempts to simply omit those subsets of functionality that are not capable from within certain client browsers or operating systems, while still delivering a ‘complete’ application to the user. (see also [[Separation of concerns]]).@@@@1@37@@danf@17-8-2009 10160690@unknown@formal@none@1@S@=====Multiple codebase=====@@@@1@2@@danf@17-8-2009 10160700@unknown@formal@none@1@S@Multiple codebase applications present different versions of an application depending on the specific client in use.@@@@1@16@@danf@17-8-2009 10160710@unknown@formal@none@1@S@This strategy is arguably the most complicated and expensive way to fulfill cross-platform capability, since even different versions of the same client browser (within the same operating system) can differ dramatically between each other.@@@@1@34@@danf@17-8-2009 10160720@unknown@formal@none@1@S@This is further complicated by the support for “plugins” which may or may not be present for any given installation of a particular browser version.@@@@1@25@@danf@17-8-2009 10160730@unknown@formal@none@1@S@=====Third party libraries=====@@@@1@3@@danf@17-8-2009 10160740@unknown@formal@none@1@S@Third party libraries attempt to simplify cross-platform capability by ‘hiding’ the complexities of client differentiation behind a single, unified API.@@@@1@20@@danf@17-8-2009 10160750@unknown@formal@none@1@S@==== Testing strategies ====@@@@1@4@@danf@17-8-2009 10160760@unknown@formal@none@1@S@One complicated aspect of cross-platform web application design is the need for [[software testing]].@@@@1@14@@danf@17-8-2009 10160770@unknown@formal@none@1@S@In addition to the complications mentioned previously, there is the additional restriction that some browsers prohibit installation of different versions of the same browser on the same operating system.@@@@1@29@@danf@17-8-2009 10160780@unknown@formal@none@1@S@Techniques such as [[full virtualization]] are sometimes used as a workaround for this problem.@@@@1@14@@danf@17-8-2009 10160790@unknown@formal@none@1@S@=== Traditional applications ===@@@@1@4@@danf@17-8-2009 10160800@unknown@formal@none@1@S@Although web applications are becoming increasingly popular, many computer users still use traditional [[application software]] which does not rely on a client/web-server architecture.@@@@1@23@@danf@17-8-2009 10160810@unknown@formal@none@1@S@The distinction between “traditional” and “web” applications is not always unambiguous, however, because applications have many different features, installation methods and architectures; and some of these can overlap and occur in ways that blur the distinction.@@@@1@36@@danf@17-8-2009 10160820@unknown@formal@none@1@S@Nevertheless, this simplifying distinction is a common and useful generalization.@@@@1@10@@danf@17-8-2009 10160830@unknown@formal@none@1@S@==== Binary software ====@@@@1@4@@danf@17-8-2009 10160840@unknown@formal@none@1@S@Traditionally in modern computing, application software has been distributed to end-users as '''binary images''', which are stored in [[executable]]s, a specific type of [[binary file]].@@@@1@25@@danf@17-8-2009 10160850@unknown@formal@none@1@S@Such [[executable]]s only support the [[operating system]] and [[computer architecture]] that they were built for—which means that making a “cross-platform executable” would be something of a massive task, and is generally not done.@@@@1@33@@danf@17-8-2009 10160860@unknown@formal@none@1@S@For software that is distributed as a [[binary file|binary]] [[executable]], such as software written in [[C (programming language)|C]] or [[C++]], the programmer must [[software build|build the software]] for each different [[operating system]] and [[computer architecture]].@@@@1@35@@danf@17-8-2009 10160870@unknown@formal@none@1@S@For example, [[Mozilla]] [[Mozilla Firefox|Firefox]], an open-source web browser, is available on [[Microsoft Windows]], [[Mac OS X]] (both [[PowerPC]] and [[x86]] through something Apple calls a '''[[Universal binary]]'''), and [[Linux]] on multiple computer architectures.@@@@1@34@@danf@17-8-2009 10160880@unknown@formal@none@1@S@The three platforms (in this case, [[Microsoft Windows|Windows]], [[Mac OS X]], and [[Linux]]) are separate [[executable]] distributions, although they come from the same [[source code]].@@@@1@25@@danf@17-8-2009 10160890@unknown@formal@none@1@S@In the context of binary software, cross-platform programs are written in the source code and then “translated” to each system that it runs on through compiling it on different platforms.@@@@1@30@@danf@17-8-2009 10160900@unknown@formal@none@1@S@Also, software can be [[porting|ported]] to a new [[computer architecture]] or [[operating system]] so that the program becomes more cross-platform than it already is.@@@@1@24@@danf@17-8-2009 10160910@unknown@formal@none@1@S@For example, a program such as Firefox, which already runs on Windows on the x86 family, can be modified and re-built to run on Linux on the x86 (and potentially other architectures) as well.@@@@1@34@@danf@17-8-2009 10160920@unknown@formal@none@1@S@As an alternative to porting, cross-platform virtualization allows applications compiled for one CPU and operating system to run on a system with a different CPU and/or operating system, without modification to the source code or binaries.@@@@1@36@@danf@17-8-2009 10160930@unknown@formal@none@1@S@As an example, [[Apple Computer|Apple's]] [[Rosetta (software)|Rosetta]] software, which is built into [[Intel]]-based Apple Macintosh computers, runs applications compiled for the previous generation of Macs that used [[PowerPC]] CPUs.@@@@1@29@@danf@17-8-2009 10160940@unknown@formal@none@1@S@Another example is IBM PowerVM Lx86, which allows Linux/x86 applications to run unmodified on the Linux/Power operating system.@@@@1@18@@danf@17-8-2009 10160950@unknown@formal@none@1@S@==== Scripts and [[interpreted language]]s ====@@@@1@6@@danf@17-8-2009 10160960@unknown@formal@none@1@S@A script can be considered to be cross-platform if the [[scripting language]] is available on multiple platforms and the script only uses the facilities provided by the language.@@@@1@28@@danf@17-8-2009 10160970@unknown@formal@none@1@S@That is, a script written in [[Python (programming language)|Python]] for a [[Unix-like]] system will likely run with little or no modification on [[Microsoft Windows|Windows]], because Python also runs on [[Microsoft Windows|Windows]]; there is also more than one implementation of Python that will run the same scripts (e.g., [[IronPython]] for [[.NET Framework|.NET]]).@@@@1@51@@danf@17-8-2009 10160980@unknown@formal@none@1@S@The same goes for many of the [[open source]] [[programming language]]s that are available and are [[scripting language]]s.@@@@1@18@@danf@17-8-2009 10160990@unknown@formal@none@1@S@Unlike [[binary file|binary]] [[executable]]s, the same script can be used on all computers that have software to interpret the script.@@@@1@20@@danf@17-8-2009 10161000@unknown@formal@none@1@S@This is because the script is generally stored in [[plain text]] in a [[text file]].@@@@1@15@@danf@17-8-2009 10161010@unknown@formal@none@1@S@There may be some issues, however, such as the type of [[newline|new line character]] that sits between the lines.@@@@1@19@@danf@17-8-2009 10161020@unknown@formal@none@1@S@Generally, however, little or no work has to be done to make a script written for one system, run on another.@@@@1@21@@danf@17-8-2009 10161030@unknown@formal@none@1@S@Some quite popular cross-platform scripting or [[interpreted language]]s are:@@@@1@9@@danf@17-8-2009 10161040@unknown@formal@none@1@S@* [[bash]]—A [[Unix shell]] commonly run on [[Linux]] and other modern [[Unix-like]] systems, as well as on [[Microsoft Windows|Windows]] via the [[Cygwin]] [[POSIX]] compatibility layer.@@@@1@25@@danf@17-8-2009 10161050@unknown@formal@none@1@S@* [[Python (programming language)|Python]]—A modern [[scripting language]] where the focus is on [[rapid application development]] and ease-of-writing, instead of program run-time efficiency.@@@@1@22@@danf@17-8-2009 10161060@unknown@formal@none@1@S@* [[Perl]]—A scripting language first released in 1987.@@@@1@8@@danf@17-8-2009 10161070@unknown@formal@none@1@S@Used for [[Common Gateway Interface|CGI]] [[WWW]] programming, small [[system administration]] tasks, and more.@@@@1@13@@danf@17-8-2009 10161080@unknown@formal@none@1@S@* [[PHP]]—A [[scripting language]] most popular in use on the [[WWW]] for [[web application]]s.@@@@1@14@@danf@17-8-2009 10161090@unknown@formal@none@1@S@* [[Ruby (programming language)|Ruby]]—A scripting language who's purpose is to be object-oriented and easy to read.@@@@1@16@@danf@17-8-2009 10161100@unknown@formal@none@1@S@Can also be used on the web through [[Ruby on Rails]].@@@@1@11@@danf@17-8-2009 10161110@unknown@formal@none@1@S@* [[Tcl]] - A dynamic programming language, suitable for a wide range of uses, including web and desktop applications, networking, administration, testing and many more.@@@@1@25@@danf@17-8-2009 10161120@unknown@formal@none@1@S@==== Video games ====@@@@1@4@@danf@17-8-2009 10161130@unknown@formal@none@1@S@Cross-platform is a term that can also apply to [[video game]]s.@@@@1@11@@danf@17-8-2009 10161140@unknown@formal@none@1@S@Such games are released on a range of [[video game console]]s and [[handheld game console]]s, which are specialized [[computer]]s dedicated to the task of playing games (and thus, are a platform as any other computer).@@@@1@35@@danf@17-8-2009 10161150@unknown@formal@none@1@S@Examples of these games include:@@@@1@5@@danf@17-8-2009 10161160@unknown@formal@none@1@S@* [[Miner 2049er]], the first major multiplatform game@@@@1@8@@danf@17-8-2009 10161170@unknown@formal@none@1@S@* [[Phantasy Star Online]]@@@@1@4@@danf@17-8-2009 10161180@unknown@formal@none@1@S@* [[Lara Croft Tomb Raider: Legend]]@@@@1@6@@danf@17-8-2009 10161190@unknown@formal@none@1@S@* [[FIFA Series]]@@@@1@3@@danf@17-8-2009 10161200@unknown@formal@none@1@S@* [[Shadow of Legend]]@@@@1@4@@danf@17-8-2009 10161210@unknown@formal@none@1@S@… which are spread across a variety of platforms, such as the [[Nintendo GameCube]], [[PlayStation 2]], [[Xbox]], [[Personal computer|PC]], and [[mobile devices]].@@@@1@22@@danf@17-8-2009 10161220@unknown@formal@none@1@S@In some cases, depending on the hardware of a particular system it may take longer than expected to create a video game across multiple platforms.@@@@1@25@@danf@17-8-2009 10161230@unknown@formal@none@1@S@So, a video game may only get released on a few platforms and then later released on the remaining platforms.@@@@1@20@@danf@17-8-2009 10161240@unknown@formal@none@1@S@Typically, this is what occurs when a new system is released, because the [[Video game developer|developer]]s of the video game need to become acquainted with the hardware and software associated with the new console.@@@@1@34@@danf@17-8-2009 10161250@unknown@formal@none@1@S@Some games may not become cross-platform because of licensing agreements between the [[Video game developer|developer]]s and the maker of the [[video game console]] which state that the game will only be made for one particular console.@@@@1@36@@danf@17-8-2009 10161260@unknown@formal@none@1@S@As an example, [[Disney]] could create a new game and wish to release it on the latest [[Nintendo]] and [[Sony]] game consoles.@@@@1@22@@danf@17-8-2009 10161270@unknown@formal@none@1@S@If [[Disney]] licenses the game with [[Sony]] first, [[Disney]] may be required to only release the game on [[Sony|Sony’s]] console for a short time, or indefinitely—effectively prohibiting the game from cross-platform at least for a period of time.@@@@1@38@@danf@17-8-2009 10161280@unknown@formal@none@1@S@Several developers have developed ways to play games online while using different platforms.@@@@1@13@@danf@17-8-2009 10161290@unknown@formal@none@1@S@Epic Games, Microsoft and Valve Software all have this technology, that allows Xbox 360 gamers and PS3 gamers to play with PC gamers, allowing gamers to finally decide which platform is the best for a game.@@@@1@36@@danf@17-8-2009 10161300@unknown@formal@none@1@S@The first game released to allow this interactivity between PC and Console games was [[Quake 3]].@@@@1@16@@danf@17-8-2009 10161310@unknown@formal@none@1@S@Games that feature cross-platform online play include:@@@@1@7@@danf@17-8-2009 10161320@unknown@formal@none@1@S@*[[Champions Online]]@@@@1@2@@danf@17-8-2009 10161330@unknown@formal@none@1@S@*[[Lost Planet: Colonies]]@@@@1@3@@danf@17-8-2009 10161340@unknown@formal@none@1@S@*[[Phantasy Star Online]]@@@@1@3@@danf@17-8-2009 10161350@unknown@formal@none@1@S@*[[Shadowrun (2007 video game)|Shadowrun]]@@@@1@4@@danf@17-8-2009 10161360@unknown@formal@none@1@S@*[[UNO (Xbox Live Arcade)|UNO]]@@@@1@4@@danf@17-8-2009 10161370@unknown@formal@none@1@S@*[[Final Fantasy XI Online]]@@@@1@4@@danf@17-8-2009 10161380@unknown@formal@none@1@S@== Platform independent software ==@@@@1@5@@danf@17-8-2009 10161390@unknown@formal@none@1@S@Software that is platform independent does not rely on any special features of any single platform, or, if it does, handles those special features such that it can deal with multiple platforms.@@@@1@32@@danf@17-8-2009 10161400@unknown@formal@none@1@S@All [[algorithm]]s, such as the [[quicksort]] algorithm, are able to be implemented on different platforms.@@@@1@15@@danf@17-8-2009 10161410@unknown@formal@none@1@S@== Cross-platform programming ==@@@@1@4@@danf@17-8-2009 10161420@unknown@formal@none@1@S@Cross-platform programming is the practice of actively writing software that will work on more than one platform.@@@@1@17@@danf@17-8-2009 10161430@unknown@formal@none@1@S@=== Approaches to cross-platform programming ===@@@@1@6@@danf@17-8-2009 10161440@unknown@formal@none@1@S@There are different ways of approaching the problem of writing a cross-platform application program.@@@@1@14@@danf@17-8-2009 10161450@unknown@formal@none@1@S@One such approach is simply to create multiple versions of the same program in different ''source trees''—in other words, the [[Microsoft Windows|Windows]] version of a program might have one set of source code files and the [[Apple Macintosh|Macintosh]] version might have another, while a FOSS *nix system might have another.@@@@1@50@@danf@17-8-2009 10161460@unknown@formal@none@1@S@While this is a straightforward approach to the problem, it has the potential to be considerably more expensive in development cost, development time, or both, especially for the corporate entities.@@@@1@30@@danf@17-8-2009 10161470@unknown@formal@none@1@S@The idea behind this is to create more than two different programs that have the ability to behave similarly to each other.@@@@1@22@@danf@17-8-2009 10161480@unknown@formal@none@1@S@It is also possible that this means of developing a cross-platform application will result in more problems with bug tracking and fixing, because the two different ''source trees'' would have different programmers, and thus different defects in each version.@@@@1@39@@danf@17-8-2009 10161490@unknown@formal@none@1@S@The smaller the programming team, the quicker the bug fixes tend to be.@@@@1@13@@danf@17-8-2009 10161500@unknown@formal@none@1@S@Another approach that is used is to depend on pre-existing software that hides the differences between the platforms—called [[abstraction]] of the platform—such that the program itself is unaware of the platform it is running on.@@@@1@35@@danf@17-8-2009 10161510@unknown@formal@none@1@S@It could be said that such programs are ''platform agnostic''.@@@@1@10@@danf@17-8-2009 10161520@unknown@formal@none@1@S@Programs that run on the [[Java (Sun)|Java]] [[Virtual Machine]] ([[Java Virtual Machine|JVM]]) are built in this fashion.@@@@1@17@@danf@17-8-2009 10161530@unknown@formal@none@1@S@Some applications mix various methods of cross-platform programming to create the final application.@@@@1@13@@danf@17-8-2009 10161540@unknown@formal@none@1@S@An example of this is the [[Firefox]] [[web browser]], which uses [[abstraction]] to build some of the lower-level components, separate source subtrees for implementing platform specific features (like the GUI), and the implementation of more than one [[scripting language]] to help facilitate ease of portability.@@@@1@45@@danf@17-8-2009 10161550@unknown@formal@none@1@S@[[Firefox]] implements [[XUL]], [[Cascading Style Sheets|CSS]] and [[JavaScript]] for extending the browser, in addition to classic [[Netscape]]-style browser plugins.@@@@1@19@@danf@17-8-2009 10161560@unknown@formal@none@1@S@Much of the browser itself is written in XUL, CSS, and JavaScript, as well.@@@@1@14@@danf@17-8-2009 10161570@unknown@formal@none@1@S@=== Cross-platform programming toolkits ===@@@@1@5@@danf@17-8-2009 10161580@unknown@formal@none@1@S@There are a number of tools which are available to help facilitate the process of cross-platform programming:@@@@1@17@@danf@17-8-2009 10161590@unknown@formal@none@1@S@* [[Simple DirectMedia Layer]]—An [[open source]] cross-platform multimedia library written in C that creates an abstraction over various platforms’ graphics, sound, and input [[Application programming interface|API]]s.@@@@1@26@@danf@17-8-2009 10161600@unknown@formal@none@1@S@It runs on many operating systems including Linux, Windows and Mac OS X and is aimed at games and multimedia applications.@@@@1@21@@danf@17-8-2009 10161610@unknown@formal@none@1@S@* [[Cairo (graphics)|Cairo]]−A [[free software]] library used to provide a vector graphics-based, device-independent API.@@@@1@14@@danf@17-8-2009 10161620@unknown@formal@none@1@S@It is designed to provide primitives for 2-dimensional drawing across a number of different backends.@@@@1@15@@danf@17-8-2009 10161630@unknown@formal@none@1@S@Cairo is written in C and has bindings for many programming languages.@@@@1@12@@danf@17-8-2009 10161640@unknown@formal@none@1@S@* ''ParaGUI''—ParaGUI is a cross-platform high-level application framework and GUI library.@@@@1@11@@danf@17-8-2009 10161650@unknown@formal@none@1@S@It can be compiled on various platforms(Linux, Win32, BeOS, Mac OS, ...).@@@@1@12@@danf@17-8-2009 10161660@unknown@formal@none@1@S@ParaGUI is based on the Simple DirectMedia Layer (SDL).@@@@1@9@@danf@17-8-2009 10161670@unknown@formal@none@1@S@ParaGUI is targeted on crossplatform multimedia applications and embedded devices operating on framebuffer displays.@@@@1@14@@danf@17-8-2009 10161680@unknown@formal@none@1@S@* [[wxWidgets]]—An open source widget toolkit that is also an [[application framework]].@@@@1@12@@danf@17-8-2009 10161690@unknown@formal@none@1@S@It runs on [[Unix-like]] systems with [[X11]], Microsoft Windows and Mac OS X. It permits applications written to use it to run on all of the systems that it supports, if the application does not use any [[operating system]]-specific programming in addition to it.@@@@1@44@@danf@17-8-2009 10161700@unknown@formal@none@1@S@* [[Qt (toolkit)|Qt]]—An application framework and [[widget toolkit]] for [[Unix-like]] systems with [[X11]], Microsoft Windows, Mac OS X, and other systems—available under both [[open source]] and commercial licenses.@@@@1@28@@danf@17-8-2009 10161710@unknown@formal@none@1@S@* [[GTK+]]—An open source widget toolkit for Unix-like systems with X11 and Microsoft Windows.@@@@1@14@@danf@17-8-2009 10161720@unknown@formal@none@1@S@* [[FLTK]]—Another open source cross platform toolkit, but more light weight because it restricts itself to the GUI.@@@@1@18@@danf@17-8-2009 10161730@unknown@formal@none@1@S@* [[Mozilla application framework|Mozilla]]—An open source platform for building Mac, Windows and Linux applications.@@@@1@14@@danf@17-8-2009 10161740@unknown@formal@none@1@S@* [[Mono (software)|Mono]] (and more specifically, [[Microsoft .NET]])—A cross-platform framework for applications and programming languages.@@@@1@15@@danf@17-8-2009 10161750@unknown@formal@none@1@S@* ''molib''—A robust commercial application toolkit library that abstracts the system calls through C++ objects (such as the file system, database system and thread implementation.).@@@@1@25@@danf@17-8-2009 10161760@unknown@formal@none@1@S@This allows for the creation of applications that compile and run under Microsoft Windows, Mac OS X, GNU/Linux, and other uses (Sun OS, AIX, HP-UX, 32/64 bit, SMP).@@@@1@28@@danf@17-8-2009 10161770@unknown@formal@none@1@S@Use in concert with ''the sandbox'' to create GUI-based applications.@@@@1@10@@danf@17-8-2009 10161780@unknown@formal@none@1@S@* [[fpGUI]] - An open source widget toolkit that is completely implemented in Object Pascal.@@@@1@15@@danf@17-8-2009 10161790@unknown@formal@none@1@S@It currently supports Linux, Windows and a bit of Windows CE.@@@@1@11@@danf@17-8-2009 10161795@unknown@formal@none@1@S@fpGUI does not rely on any large libraries, instead it talks directly to Xlib (Linux) or GDI (Windows).@@@@1@18@@danf@17-8-2009 10161800@unknown@formal@none@1@S@The framework is compiled with the Free Pascal compiler.@@@@1@9@@danf@17-8-2009 10161810@unknown@formal@none@1@S@Mac OS support is also in the works.@@@@1@8@@danf@17-8-2009 10161820@unknown@formal@none@1@S@* [[Tcl/Tk]] - Tcl (Tool Command Language) is a dynamic programming language, suitable for a wide range of uses, including web and desktop applications, networking, administration, testing and many more.@@@@1@30@@danf@17-8-2009 10161830@unknown@formal@none@1@S@Open source and business-friendly, Tcl is a mature yet evolving language that is truly cross platform, easily deployed and highly extensible.@@@@1@21@@danf@17-8-2009 10161840@unknown@formal@none@1@S@Tk is a graphical user interface toolkit that takes developing desktop applications to a higher level than conventional approaches.@@@@1@19@@danf@17-8-2009 10161850@unknown@formal@none@1@S@Tk is the standard GUI not only for Tcl, but for many other dynamic languages, and can produce rich, native applications that run unchanged across Windows, Mac OS X, Linux and more.@@@@1@32@@danf@17-8-2009 10161860@unknown@formal@none@1@S@The combination of Tcl and the Tk GUI toolkit is referred to as Tcl/Tk.@@@@1@14@@danf@17-8-2009 10161870@unknown@formal@none@1@S@* [[XVT]] is a cross-platform toolkit for creating enterprise and desktop applications in C/C++ on Windows, Linux and Unix (Solaris, HPUX, AIX), and Mac.@@@@1@24@@danf@17-8-2009 10161880@unknown@formal@none@1@S@Most recent release is 5.8, in April 2007@@@@1@8@@danf@17-8-2009 10161890@unknown@formal@none@1@S@=== Cross-platform development environments ===@@@@1@5@@danf@17-8-2009 10161900@unknown@formal@none@1@S@Cross-platform applications can also be built using proprietary [[Integrated development environment|IDE]]s, or so-called [[Rapid Application Development]] tools.@@@@1@17@@danf@17-8-2009 10161910@unknown@formal@none@1@S@There are a number of development environments which allow developers to build and deploy applications across multiple platforms:@@@@1@18@@danf@17-8-2009 10161920@unknown@formal@none@1@S@* [[Eclipse (software)| Eclipse]]—An Open source [[software framework]] and [[Integrated development environment|IDE]] extendable through plug-ins including the C++ Development Toolkit.@@@@1@20@@danf@17-8-2009 10161930@unknown@formal@none@1@S@Eclipse is available on any operating system with a modern Java virtual machine (including Windows, Linux, and Mac OS X, Sun, HP-UX, and other systems).@@@@1@25@@danf@17-8-2009 10161940@unknown@formal@none@1@S@* [[IntelliJ IDEA]]—A proprietary [[Integrated development environment|IDE]]@@@@1@7@@danf@17-8-2009 10161950@unknown@formal@none@1@S@* [[NetBeans]]—An Open source [[software framework]] and [[Integrated development environment|IDE]] extendable through plug-ins.@@@@1@13@@danf@17-8-2009 10161960@unknown@formal@none@1@S@NetBeans is available on any operating system with a modern Java virtual machine (including Windows, Linux, and Mac OS X, Sun, HP-UX, and other systems).@@@@1@25@@danf@17-8-2009 10161970@unknown@formal@none@1@S@Similar to Eclipse in features and functionality.@@@@1@7@@danf@17-8-2009 10161980@unknown@formal@none@1@S@Promoted by [[Sun Microsystems]]@@@@1@4@@danf@17-8-2009 10161990@unknown@formal@none@1@S@* [[Omnis Studio]]—A proprietary [[Integrated development environment|IDE]] or Rapid Application Development tool for creating enterprise and web applications for Windows, Linux, and Mac OS X.@@@@1@25@@danf@17-8-2009 10162000@unknown@formal@none@1@S@* [[Runtime Revolution]]—a proprietary [[Integrated development environment|IDE]], compiler engine and CGI builder that [[cross compile]]s to [[Microsoft Windows|Windows]], [[Mac OS X]] ([[PowerPC|PPC]], [[Intel]]), [[Linux]], [[Solaris Operating System|Solaris]], [[BSD]], and [[Irix]].@@@@1@30@@danf@17-8-2009 10162010@unknown@formal@none@1@S@*[[Code::Blocks]]—A free/open source, cross platform IDE.@@@@1@6@@danf@17-8-2009 10162020@unknown@formal@none@1@S@It is developed in C++ using wxWidgets.@@@@1@7@@danf@17-8-2009 10162030@unknown@formal@none@1@S@Using a plugin architecture, its capabilities and features are defined by the provided plugins.@@@@1@14@@danf@17-8-2009 10162040@unknown@formal@none@1@S@*[[Lazarus (software)]]—Lazarus is a cross platform Visual IDE developed for and supported by the open source Free Pascal compiler.@@@@1@19@@danf@17-8-2009 10162050@unknown@formal@none@1@S@It aims to provide a Rapid Application Development Delphi Clone for Pascal and Object Pascal developers.@@@@1@16@@danf@17-8-2009 10162060@unknown@formal@none@1@S@*[[REALbasic]]—REALbasic (RB) is an object-oriented dialect of the BASIC programming language developed and commercially marketed by REAL Software, Inc in Austin, Texas for Mac OS X, Microsoft Windows, and Linux.@@@@1@30@@danf@17-8-2009 10162070@unknown@formal@none@1@S@== Criticisms of cross-platform development ==@@@@1@6@@danf@17-8-2009 10162080@unknown@formal@none@1@S@There are certain issues associated with cross-platform development.@@@@1@8@@danf@17-8-2009 10162090@unknown@formal@none@1@S@Some of these include:@@@@1@4@@danf@17-8-2009 10162100@unknown@formal@none@1@S@* Testing cross-platform applications may also be considerably more complicated, since different platforms can exhibit slightly different behaviors or subtle bugs.@@@@1@21@@danf@17-8-2009 10162110@unknown@formal@none@1@S@This problem has led some developers to deride cross-platform development as “Write Once, Debug Everywhere”, a take on Sun’s [[Write once, run anywhere|“Write Once, Run Anywhere”]] marketing slogan.@@@@1@28@@danf@17-8-2009 10162120@unknown@formal@none@1@S@* Developers are often restricted to using the [[lowest common denominator]] subset of features which are available on all platforms.@@@@1@20@@danf@17-8-2009 10162130@unknown@formal@none@1@S@This may hinder the application's performance or prohibit developers from using platforms’ most advanced features.@@@@1@15@@danf@17-8-2009 10162140@unknown@formal@none@1@S@* Different platforms often have different user interface conventions, which cross-platform applications do not always accommodate.@@@@1@16@@danf@17-8-2009 10162150@unknown@formal@none@1@S@For example, applications developed for Mac OS X and [[GNOME]] are supposed to place the most important button on the right-hand side of windows and dialogs, whereas Microsoft Windows and [[KDE]] have the opposite convention.@@@@1@35@@danf@17-8-2009 10162160@unknown@formal@none@1@S@Though many of these differences are subtle, a cross-platform application which does not conform appropriately to these conventions may feel clunky or alien to the user.@@@@1@26@@danf@17-8-2009 10162170@unknown@formal@none@1@S@When working quickly, such opposing conventions may even result in [[data loss]], such as in a [[dialog box]] confirming whether the user wants to save or discard changes to a file.@@@@1@31@@danf@17-8-2009 10162180@unknown@formal@none@1@S@* Scripting languages and virtual machines must be translated into native executable code each time the application is executed, imposing a performance penalty.@@@@1@23@@danf@17-8-2009 10162190@unknown@formal@none@1@S@This performance hit can be alleviated using advanced techniques like [[just-in-time compilation]]; but even using such techniques, some performance overhead may be unavoidable.@@@@1@23@@danf@17-8-2009 10170010@unknown@formal@none@1@S@
Data
@@@@1@1@@danf@17-8-2009 10170020@unknown@formal@none@1@S@'''Data''' (singular: '''datum''') are collected of natural phenomena descriptors including the results of [[experience]], [[observation]] or [[experiment]], or a set of [[premise]]s.@@@@1@22@@danf@17-8-2009 10170030@unknown@formal@none@1@S@This may consist of [[number]]s, [[word]]s, or [[image]]s, particularly as [[measurement]]s or observations of a set of [[variable]]s.@@@@1@18@@danf@17-8-2009 10170040@unknown@formal@none@1@S@==Etymology==@@@@1@1@@danf@17-8-2009 10170050@unknown@formal@none@1@S@The word ''data ''is the plural of [[Latin]] ''[[datum]]'', [[Grammatical gender|neuter]] past [[participle]] of ''dare'', "to give", hence "something given".@@@@1@20@@danf@17-8-2009 10170060@unknown@formal@none@1@S@The [[past participle]] of "to give" has been used for millennia, in the sense of a statement accepted at face value; one of the works of [[Euclid]], circa 300 BC, was the ''Dedomena'' (in Latin, ''Data'').@@@@1@36@@danf@17-8-2009 10170070@unknown@formal@none@1@S@In discussions of problems in [[geometry]], [[mathematics]], [[engineering]], and so on, the terms ''givens'' and ''data'' are used interchangeably.@@@@1@19@@danf@17-8-2009 10170080@unknown@formal@none@1@S@Such usage is the origin of ''data'' as a concept in [[computer science]]:'' ''data'' ''are numbers, words, images, etc., accepted as they stand.@@@@1@23@@danf@17-8-2009 10170090@unknown@formal@none@1@S@Pronounced dey-tuh, dat-uh, or dah-tuh.''@@@@1@5@@danf@17-8-2009 10170100@unknown@formal@none@1@S@[[Experimental data]] are data generated within the context of a scientific investigation.@@@@1@12@@danf@17-8-2009 10170110@unknown@formal@none@1@S@Mathematically, data can be grouped in many ways.@@@@1@8@@danf@17-8-2009 10170120@unknown@formal@none@1@S@==Usage in English==@@@@1@3@@danf@17-8-2009 10170130@unknown@formal@none@1@S@In [[English language|English]], the word ''datum'' is still used in the general sense of "something given", and more specifically in [[cartography]], [[geography]], [[geology]], [[NMR]] and [[technical drawing|drafting]] to mean a reference point, reference line, or reference surface.@@@@1@37@@danf@17-8-2009 10170140@unknown@formal@none@1@S@More generally speaking, any measurement or result can be called a (single) ''datum'', but ''data point'' is more common.@@@@1@19@@danf@17-8-2009 10170150@unknown@formal@none@1@S@Both ''datums'' (see usage in [[datum]] article) and the originally Latin plural ''data'' are used as the plural of ''datum'' in English, but ''data'' is more commonly treated as a [[mass noun]] and used in the [[Grammatical number|singular]], especially in day-to-day usage.@@@@1@42@@danf@17-8-2009 10170160@unknown@formal@none@1@S@For example, "This is all the data from the experiment".@@@@1@10@@danf@17-8-2009 10170170@unknown@formal@none@1@S@This usage is inconsistent with the rules of Latin grammar and traditional English, which would instead suggest "These are all the data from the experiment".@@@@1@25@@danf@17-8-2009 10170180@unknown@formal@none@1@S@Some British and UN academic, scientific, and professional [[style guides]] (e.g., see page 43 of the [http://whqlibdoc.who.int/hq/2004/WHO_IMD_PUB_04.1.pdf World Health Organization Style Guide]) request that authors treat ''data'' as a plural noun.@@@@1@31@@danf@17-8-2009 10170190@unknown@formal@none@1@S@Other international organization, such as the IEEE computing society , allow its usage as either a mass noun or plural based on author preference.@@@@1@24@@danf@17-8-2009 10170200@unknown@formal@none@1@S@It is now usually treated as a singular mass noun in informal usage, but usage in scientific publications shows a strong UK/U.S divide.@@@@1@23@@danf@17-8-2009 10170210@unknown@formal@none@1@S@U.S. usage tends to treat ''data'' in the singular, including in serious and academic publishing, although some major newspapers (such as the [[New York Times]]) regularly use it in the plural.@@@@1@31@@danf@17-8-2009 10170220@unknown@formal@none@1@S@"The plural usage is still common, as this headline from the New York Times attests: “Data Are Elusive on the Homeless.”@@@@1@21@@danf@17-8-2009 10170230@unknown@formal@none@1@S@Sometimes scientists think of data as plural, as in ''These data do not support the conclusions.''@@@@1@16@@danf@17-8-2009 10170240@unknown@formal@none@1@S@But more often scientists and researchers think of data as a singular mass entity like information, and most people now follow this in general usage.@@@@1@25@@danf@17-8-2009 10170245@unknown@formal@none@1@S@"[http://www.bartleby.com/61/51/D0035100.html] UK usage now widely accepts treating ''data'' as singular in standard English, including everyday newspaper usage at least in non-scientific use.@@@@1@22@@danf@17-8-2009 10170250@unknown@formal@none@1@S@UK scientific publishing usually still prefers treating it as a plural..@@@@1@11@@danf@17-8-2009 10170260@unknown@formal@none@1@S@Some UK university style guides recommend using ''data'' for both singular and plural use and some recommend treating it only as a singular in connection with computers.@@@@1@27@@danf@17-8-2009 10170270@unknown@formal@none@1@S@==Uses of ''data'' in science and computing==@@@@1@7@@danf@17-8-2009 10170280@unknown@formal@none@1@S@''Raw data'' are [[number]]s, [[character (computing)|characters]], [[image]]s or other outputs from devices to convert physical quantities into symbols, in a very broad sense.@@@@1@23@@danf@17-8-2009 10170290@unknown@formal@none@1@S@Such data are typically further [[data processing|processed]] by a human or [[input]] into a [[computer]], [[Computer storage|stored]] and processed there, or transmitted ([[output]]) to another human or computer.@@@@1@28@@danf@17-8-2009 10170300@unknown@formal@none@1@S@''Raw data'' is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.@@@@1@28@@danf@17-8-2009 10170310@unknown@formal@none@1@S@Mechanical computing devices are classified according to the means by which they represent data.@@@@1@14@@danf@17-8-2009 10170320@unknown@formal@none@1@S@An [[analog computer]] represents a datum as a voltage, distance, position, or other physical quantity.@@@@1@15@@danf@17-8-2009 10170330@unknown@formal@none@1@S@A [[digital computer]] represents a datum as a sequence of symbols drawn from a fixed [[alphabet]].@@@@1@16@@danf@17-8-2009 10170340@unknown@formal@none@1@S@The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted "0" and "1".@@@@1@21@@danf@17-8-2009 10170350@unknown@formal@none@1@S@More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.@@@@1@15@@danf@17-8-2009 10170360@unknown@formal@none@1@S@Some special forms of data are distinguished.@@@@1@7@@danf@17-8-2009 10170370@unknown@formal@none@1@S@A [[computer program]] is a collection of data, which can be interpreted as instructions.@@@@1@14@@danf@17-8-2009 10170380@unknown@formal@none@1@S@Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably [[Lisp programming language|Lisp]] and similar languages, programs are essentially indistinguishable from other data.@@@@1@34@@danf@17-8-2009 10170390@unknown@formal@none@1@S@It is also useful to distinguish [[metadata]], that is, a description of other data.@@@@1@14@@danf@17-8-2009 10170400@unknown@formal@none@1@S@A similar yet earlier term for metadata is "ancillary data."@@@@1@10@@danf@17-8-2009 10170410@unknown@formal@none@1@S@The prototypical example of metadata is the library catalog, which is a description of the contents of books.@@@@1@18@@danf@17-8-2009 10170420@unknown@formal@none@1@S@==Meaning of data, information and knowledge==@@@@1@6@@danf@17-8-2009 10170430@unknown@formal@none@1@S@The terms [[information]] and [[knowledge]] are frequently used for overlapping concepts.@@@@1@11@@danf@17-8-2009 10170440@unknown@formal@none@1@S@The main difference is in the level of [[abstraction]] being considered.@@@@1@11@@danf@17-8-2009 10170450@unknown@formal@none@1@S@Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.@@@@1@22@@danf@17-8-2009 10170460@unknown@formal@none@1@S@For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".@@@@1@44@@danf@17-8-2009 10170470@unknown@formal@none@1@S@Information as a concept bears a diversity of meanings, from everyday usage to technical settings.@@@@1@15@@danf@17-8-2009 10170480@unknown@formal@none@1@S@Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.@@@@1@26@@danf@17-8-2009 10170490@unknown@formal@none@1@S@Beynon-Davies uses the concept of a [[sign]] to distinguish between [[data]] and [[information]].@@@@1@13@@danf@17-8-2009 10170500@unknown@formal@none@1@S@Data are symbols.@@@@1@3@@danf@17-8-2009 10170510@unknown@formal@none@1@S@Information occurs when symbols are used to refer to something.@@@@1@10@@danf@17-8-2009 10180010@unknown@formal@none@1@S@
Data analysis
@@@@1@2@@danf@17-8-2009 10180020@unknown@formal@none@1@S@'''Data analysis''' is the process of looking at and summarizing '''[[data]]''' with the intent to extract useful [[information]] and develop conclusions.@@@@1@21@@danf@17-8-2009 10180030@unknown@formal@none@1@S@Data analysis is closely related to [[data mining]], but data mining tends to focus on larger data sets, with less emphasis on making [[inference]], and often uses data that was originally collected for a different purpose.@@@@1@36@@danf@17-8-2009 10180040@unknown@formal@none@1@S@In [[statistics|statistical applications]], some people divide data analysis into [[descriptive statistics]], [[exploratory data analysis]] and [[confirmatory data analysis]], where the EDA focuses on discovering new features in the data, and CDA on confirming or falsifying existing hypotheses.@@@@1@37@@danf@17-8-2009 10180050@unknown@formal@none@1@S@Data analysis assumes different aspects, and possibly different names, in different fields.@@@@1@12@@danf@17-8-2009 10180060@unknown@formal@none@1@S@The term ''data analysis'' is also used as a synonym for [[data modeling]], which is unrelated to the subject of this article.@@@@1@22@@danf@17-8-2009 10180070@unknown@formal@none@1@S@==Nuclear and particle physics==@@@@1@4@@danf@17-8-2009 10180080@unknown@formal@none@1@S@In [[nuclear physics|nuclear]] and [[particle physics]] the data usually originate from the [[particle detector|experimental apparatus]] via a [[data acquisition]] system.@@@@1@20@@danf@17-8-2009 10180090@unknown@formal@none@1@S@It is then processed, in a step usually called ''data reduction'', to apply calibrations and to extract physically significant information.@@@@1@20@@danf@17-8-2009 10180100@unknown@formal@none@1@S@Data reduction is most often, especially in large particle physics experiments, an automatic, batch-mode operation carried out by software written ad-hoc.@@@@1@21@@danf@17-8-2009 10180110@unknown@formal@none@1@S@The resulting data ''n-tuples'' are then scrutinized by the physicists, using specialized software tools like [[ROOT]] or [[Physics Analysis Workstation|PAW]], comparing the results of the experiment with theory.@@@@1@28@@danf@17-8-2009 10180120@unknown@formal@none@1@S@The theoretical models are often difficult to compare directly with the results of the experiments, so they are used instead as input for [[Monte Carlo method|Monte Carlo simulation]] software like [[Geant4]] that predict the response of the detector to a given theoretical event, producing '''simulated events''' which are then compared to experimental data.@@@@1@53@@danf@17-8-2009 10180130@unknown@formal@none@1@S@See also: [[Computational physics]].@@@@1@4@@danf@17-8-2009 10180140@unknown@formal@none@1@S@==Social sciences==@@@@1@2@@danf@17-8-2009 10180150@unknown@formal@none@1@S@[[Qualitative data analysis]] (QDA) or [[qualitative research]] is the analysis of non-numerical data, for example words, photographs, observations, etc..@@@@1@19@@danf@17-8-2009 10180160@unknown@formal@none@1@S@==Information technology==@@@@1@2@@danf@17-8-2009 10180170@unknown@formal@none@1@S@A special case is the [[Data analysis (information technology in othm )|data analysis in information technology audits]].@@@@1@17@@danf@17-8-2009 10180180@unknown@formal@none@1@S@==Business==@@@@1@1@@danf@17-8-2009 10180190@unknown@formal@none@1@S@See@@@@1@1@@danf@17-8-2009 10180200@unknown@formal@none@1@S@* [[Analytics]]@@@@1@2@@danf@17-8-2009 10180210@unknown@formal@none@1@S@* [[Business intelligence]]@@@@1@3@@danf@17-8-2009 10180220@unknown@formal@none@1@S@* [[Data mining]]@@@@1@3@@danf@17-8-2009 10190010@unknown@formal@none@1@S@
Database
@@@@1@1@@danf@17-8-2009 10190020@unknown@formal@none@1@S@A '''database''' is a [[structure]]d collection of records or [[data]].@@@@1@10@@danf@17-8-2009 10190030@unknown@formal@none@1@S@A [[computer]] database relies upon [[software]] to organize the storage of data.@@@@1@12@@danf@17-8-2009 10190040@unknown@formal@none@1@S@The software models the database structure in what are known as [[database model]]s.@@@@1@13@@danf@17-8-2009 10190050@unknown@formal@none@1@S@The model in most common use today is the [[relational model]].@@@@1@11@@danf@17-8-2009 10190060@unknown@formal@none@1@S@Other models such as the [[hierarchical model]] and the [[network model]] use a more explicit representation of relationships (see below for explanation of the various database models).@@@@1@27@@danf@17-8-2009 10190070@unknown@formal@none@1@S@Database management systems (DBMS) are the software used to organize and maintain the database.@@@@1@14@@danf@17-8-2009 10190080@unknown@formal@none@1@S@These are categorized according to the [[database model]] that they support.@@@@1@11@@danf@17-8-2009 10190090@unknown@formal@none@1@S@The model tends to determine the query languages that are available to access the database.@@@@1@15@@danf@17-8-2009 10190100@unknown@formal@none@1@S@A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from [[hardware failure]]s.@@@@1@33@@danf@17-8-2009 10190110@unknown@formal@none@1@S@In these areas there are large differences between products.@@@@1@9@@danf@17-8-2009 10190120@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10190130@unknown@formal@none@1@S@The earliest known use of the term '''''data base''''' was in November 1963, when the [[System Development Corporation]] sponsored a symposium under the title ''Development and Management of a Computer-centered Data Base''.@@@@1@32@@danf@17-8-2009 10190140@unknown@formal@none@1@S@'''Database''' as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers.@@@@1@28@@danf@17-8-2009 10190150@unknown@formal@none@1@S@(The abbreviation DB, however, survives.)@@@@1@5@@danf@17-8-2009 10190160@unknown@formal@none@1@S@The first database management systems were developed in the 1960s.@@@@1@10@@danf@17-8-2009 10190170@unknown@formal@none@1@S@A pioneer in the field was [[Charles Bachman]].@@@@1@8@@danf@17-8-2009 10190180@unknown@formal@none@1@S@Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on [[punch card|punched cards]] and [[magnetic tape]], so that serial processing was the dominant activity.@@@@1@44@@danf@17-8-2009 10190190@unknown@formal@none@1@S@Two key [[data model]]s arose at this time: [[CODASYL]] developed the [[network model]] based on Bachman's ideas, and (apparently independently) the [[hierarchical model]] was used in a system developed by [[North American Rockwell]] later adopted by [[IBM]] as the cornerstone of their [[Information Management System|IMS]] product.@@@@1@46@@danf@17-8-2009 10190200@unknown@formal@none@1@S@While IMS along with the CODASYL [[IDMS]] were the big, high visibility databases developed in the 1960s, several others were also born in that decade, some of which have a significant installed base today.@@@@1@34@@danf@17-8-2009 10190210@unknown@formal@none@1@S@Two worthy of mention are the [[Pick operating system|PICK]] and [[MUMPS]] databases, with the former developed originally as an operating system with an embedded database and the latter as a programming language and database for the development of healthcare systems.@@@@1@40@@danf@17-8-2009 10190220@unknown@formal@none@1@S@The [[relational model]] was proposed by [[Edgar F. Codd|E. F. Codd]] in 1970.@@@@1@13@@danf@17-8-2009 10190230@unknown@formal@none@1@S@He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms.@@@@1@18@@danf@17-8-2009 10190240@unknown@formal@none@1@S@For a long while, however, the relational model remained of academic interest only.@@@@1@13@@danf@17-8-2009 10190250@unknown@formal@none@1@S@While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time.@@@@1@47@@danf@17-8-2009 10190260@unknown@formal@none@1@S@Among the first implementations were [[Michael Stonebraker]]'s [[Ingres (database)|Ingres]] at [[University of California, Berkeley|Berkeley]], and the [[System R]] project at IBM.@@@@1@21@@danf@17-8-2009 10190270@unknown@formal@none@1@S@Both of these were research prototypes, announced during 1976.@@@@1@9@@danf@17-8-2009 10190280@unknown@formal@none@1@S@The first commercial products, [[Oracle database|Oracle]] and [[IBM DB2|DB2]], did not appear until around 1980.@@@@1@15@@danf@17-8-2009 10190290@unknown@formal@none@1@S@The first successful database product for microcomputers was [[dBASE]] for the [[CP/M]] and [[PC-DOS]]/[[MS-DOS]] operating systems.@@@@1@16@@danf@17-8-2009 10190300@unknown@formal@none@1@S@During the 1980s, research activity focused on [[distributed database]] systems and [[database machine]]s.@@@@1@13@@danf@17-8-2009 10190310@unknown@formal@none@1@S@Another important theoretical idea was the [[Functional Data Model]], but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.@@@@1@27@@danf@17-8-2009 10190320@unknown@formal@none@1@S@In the 1990s, attention shifted to [[OODB|object-oriented databases]].@@@@1@8@@danf@17-8-2009 10190330@unknown@formal@none@1@S@These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as [[spatial database]]s, engineering data (including software [[Software repository|repositories]]), and multimedia data.@@@@1@35@@danf@17-8-2009 10190340@unknown@formal@none@1@S@Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result.@@@@1@20@@danf@17-8-2009 10190350@unknown@formal@none@1@S@The 1990s also saw the spread of [[Open Source]] databases, such as [[PostgreSQL]] and [[MySQL]].@@@@1@15@@danf@17-8-2009 10190360@unknown@formal@none@1@S@In the 2000s, the fashionable area for innovation is the [[XML database]].@@@@1@12@@danf@17-8-2009 10190370@unknown@formal@none@1@S@As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products.@@@@1@29@@danf@17-8-2009 10190380@unknown@formal@none@1@S@[[XML databases]] aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.@@@@1@32@@danf@17-8-2009 10190390@unknown@formal@none@1@S@==Database models==@@@@1@2@@danf@17-8-2009 10190400@unknown@formal@none@1@S@Various techniques are used to model data structure.@@@@1@8@@danf@17-8-2009 10190410@unknown@formal@none@1@S@Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model.@@@@1@25@@danf@17-8-2009 10190420@unknown@formal@none@1@S@For any one [[logical model]] various physical implementations may be possible, and most products will offer the user some level of control in tuning the [[physical implementation]], since the choices that are made have a significant effect on performance.@@@@1@39@@danf@17-8-2009 10190430@unknown@formal@none@1@S@Here are three examples:@@@@1@4@@danf@17-8-2009 10190440@unknown@formal@none@1@S@===Hierarchical model===@@@@1@2@@danf@17-8-2009 10190450@unknown@formal@none@1@S@In a [[hierarchical model]], data is organized into an inverted tree-like structure, implying a multiple downward link in each node to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.@@@@1@40@@danf@17-8-2009 10190460@unknown@formal@none@1@S@This structure arranges the various data elements in a hierarchy and helps to establish logical relationships among data elements of multiple files.@@@@1@22@@danf@17-8-2009 10190470@unknown@formal@none@1@S@Each unit in the model is a record which is also known as a node.@@@@1@15@@danf@17-8-2009 10190480@unknown@formal@none@1@S@In such a model, each record on one level can be related to multiple records on the next lower level.@@@@1@20@@danf@17-8-2009 10190490@unknown@formal@none@1@S@A record that has subsidiary records is called a parent and the subsidiary records are called children.@@@@1@17@@danf@17-8-2009 10190500@unknown@formal@none@1@S@Data elements in this model are well suited for one-to-many relationships with other data elements in the database.@@@@1@18@@danf@17-8-2009 10190510@unknown@formal@none@1@S@This model is advantageous when the data elements are inherently hierarchical.@@@@1@11@@danf@17-8-2009 10190520@unknown@formal@none@1@S@The disadvantage is that in order to prepare the database it becomes necessary to identify the requisite groups of files that are to be logically integrated.@@@@1@26@@danf@17-8-2009 10190530@unknown@formal@none@1@S@Hence, a hierarchical data model may not always be flexible enough to accommodate the dynamic needs of an organization.@@@@1@19@@danf@17-8-2009 10190540@unknown@formal@none@1@S@===Network model===@@@@1@2@@danf@17-8-2009 10190550@unknown@formal@none@1@S@The [[network model]] tends to store records with links to other records.@@@@1@12@@danf@17-8-2009 10190560@unknown@formal@none@1@S@Each record in the database can have multiple parents, i.e., the relationships among data elements can have a many to many relationship.@@@@1@22@@danf@17-8-2009 10190570@unknown@formal@none@1@S@Associations are tracked via "pointers".@@@@1@5@@danf@17-8-2009 10190580@unknown@formal@none@1@S@These pointers can be node numbers or disk addresses.@@@@1@9@@danf@17-8-2009 10190590@unknown@formal@none@1@S@Most network databases tend to also include some form of hierarchical model.@@@@1@12@@danf@17-8-2009 10190600@unknown@formal@none@1@S@Databases can be translated from hierarchical model to network and vice versa.@@@@1@12@@danf@17-8-2009 10190610@unknown@formal@none@1@S@The main difference between the network model and hierarchical model is that in a network model, a child can have a number of parents whereas in a hierarchical model, a child can have only one parent.@@@@1@36@@danf@17-8-2009 10190620@unknown@formal@none@1@S@The network model provides greater advantage than the hierarchical model in that promotes greater flexibility and data accessibility, since records at a lower level can be accessed without accessing the records above them.@@@@1@33@@danf@17-8-2009 10190630@unknown@formal@none@1@S@This model is more efficient than hierarchical model, easier to understand and can be applied to many real world problems that require routine transactions.@@@@1@24@@danf@17-8-2009 10190640@unknown@formal@none@1@S@The disadvantages are that: It is a complex process to design and develop a network database; It has to be refined frequently; It requires that the relationships among all the records be defined before development starts, and changes often demand major programming efforts; Operation and maintenance of the network model is expensive and time consuming.@@@@1@55@@danf@17-8-2009 10190650@unknown@formal@none@1@S@Examples of database engines that have network model capabilities are [[RDM Embedded]] and [[RDM Server]].@@@@1@15@@danf@17-8-2009 10190660@unknown@formal@none@1@S@===Relational model===@@@@1@2@@danf@17-8-2009 10190670@unknown@formal@none@1@S@The basic data structure of the relational model is a table where information about a particular entity (say, an employee) is represented in columns and rows.@@@@1@26@@danf@17-8-2009 10190680@unknown@formal@none@1@S@The columns enumerate the various attributes of an entity (e.g. employee_name, address, phone_number).@@@@1@13@@danf@17-8-2009 10190690@unknown@formal@none@1@S@Rows (also called records) represent instances of an entity (e.g. specific employees).@@@@1@12@@danf@17-8-2009 10190700@unknown@formal@none@1@S@The "relation" in "relational database" comes from the mathematical notion of [[Relation (mathematics)|relations]] from the field of [[set theory]].@@@@1@19@@danf@17-8-2009 10190710@unknown@formal@none@1@S@A relation is a set of [[tuple]]s, so rows are sometimes called tuples.@@@@1@13@@danf@17-8-2009 10190720@unknown@formal@none@1@S@All tables in a relational database adhere to three basic rules.@@@@1@11@@danf@17-8-2009 10190730@unknown@formal@none@1@S@* The ordering of columns is immaterial@@@@1@7@@danf@17-8-2009 10190740@unknown@formal@none@1@S@* Identical rows are not allowed in a table@@@@1@9@@danf@17-8-2009 10190750@unknown@formal@none@1@S@* Each row has a single (separate) value for each of its columns (each tuple has an atomic value).@@@@1@19@@danf@17-8-2009 10190760@unknown@formal@none@1@S@If the same value occurs in two different records (from the same table or different tables) it can imply a relationship between those records.@@@@1@24@@danf@17-8-2009 10190770@unknown@formal@none@1@S@Relationships between records are often categorized by their [[Cardinality (data modeling)|cardinality]] (1:1, (0), 1:M, M:M).@@@@1@15@@danf@17-8-2009 10190780@unknown@formal@none@1@S@Tables can have a designated column or set of columns that act as a "key" to select rows from that table with the same or similar key values.@@@@1@28@@danf@17-8-2009 10190790@unknown@formal@none@1@S@A "primary key" is a key that has a unique value for each row in the table.@@@@1@17@@danf@17-8-2009 10190800@unknown@formal@none@1@S@Keys are commonly used to join or combine data from two or more tables.@@@@1@14@@danf@17-8-2009 10190810@unknown@formal@none@1@S@For example, an ''employee'' table may contain a column named ''address'' which contains a value that matches the key of an ''address'' table.@@@@1@23@@danf@17-8-2009 10190820@unknown@formal@none@1@S@Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables.@@@@1@18@@danf@17-8-2009 10190830@unknown@formal@none@1@S@It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.@@@@1@29@@danf@17-8-2009 10190840@unknown@formal@none@1@S@====Relational operations====@@@@1@2@@danf@17-8-2009 10190850@unknown@formal@none@1@S@Users (or programs) request data from a relational database by sending it a [[query]] that is written in a special language, usually a dialect of [[SQL]].@@@@1@26@@danf@17-8-2009 10190860@unknown@formal@none@1@S@Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface.@@@@1@26@@danf@17-8-2009 10190870@unknown@formal@none@1@S@Many web applications, such as [[Wikipedia]], perform SQL queries when generating pages.@@@@1@12@@danf@17-8-2009 10190880@unknown@formal@none@1@S@In response to a query, the database returns a result set, which is the list of rows constituting the answer.@@@@1@20@@danf@17-8-2009 10190890@unknown@formal@none@1@S@The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.@@@@1@29@@danf@17-8-2009 10190900@unknown@formal@none@1@S@Often, data from multiple tables are combined into one, by doing a [[Join (SQL)|join]].@@@@1@14@@danf@17-8-2009 10190910@unknown@formal@none@1@S@There are a number of relational operations in addition to join.@@@@1@11@@danf@17-8-2009 10190920@unknown@formal@none@1@S@====Normal forms====@@@@1@2@@danf@17-8-2009 10190930@unknown@formal@none@1@S@Relations are classified based upon the types of anomalies to which they're vulnerable.@@@@1@13@@danf@17-8-2009 10190940@unknown@formal@none@1@S@A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies.@@@@1@28@@danf@17-8-2009 10190950@unknown@formal@none@1@S@Normal forms are hierarchical in nature.@@@@1@6@@danf@17-8-2009 10190960@unknown@formal@none@1@S@That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal form.@@@@1@34@@danf@17-8-2009 10190970@unknown@formal@none@1@S@==Database Management Systems==@@@@1@3@@danf@17-8-2009 10190980@unknown@formal@none@1@S@===Relational database management systems===@@@@1@4@@danf@17-8-2009 10190990@unknown@formal@none@1@S@An RDBMS implements the features of the relational model outlined above.@@@@1@11@@danf@17-8-2009 10191000@unknown@formal@none@1@S@In this context, [[Christopher J. Date|Date]]'s '''Information Principle''' states:@@@@1@9@@danf@17-8-2009 10191010@unknown@formal@none@1@S@
The entire information content of the database is represented in one and only one way.@@@@1@16@@danf@17-8-2009 10191020@unknown@formal@none@1@S@Namely as explicit values in column positions (attributes) and rows in relations ([[tuple]]s) Therefore, there are no explicit pointers between related tables.
@@@@1@22@@danf@17-8-2009 10191030@unknown@formal@none@1@S@===Post-relational database models===@@@@1@3@@danf@17-8-2009 10191040@unknown@formal@none@1@S@Several products have been identified as [[post-relational]] because the data model incorporates [[relations]] but is not constrained by the Information Principle, requiring that all information is represented by [[data values]] in relations.@@@@1@32@@danf@17-8-2009 10191050@unknown@formal@none@1@S@Products using a post-relational data model typically employ a model that actually pre-dates the [[relational model]].@@@@1@16@@danf@17-8-2009 10191060@unknown@formal@none@1@S@These might be identified as a [[directed graph]] with [[tree data structure|trees]] on the [[data structure|nodes]].@@@@1@16@@danf@17-8-2009 10191070@unknown@formal@none@1@S@Examples of models that could be classified as post-relational are [[Pick operating system|PICK]] aka [[Multidimensional database|MultiValue]], and [[MUMPS]].@@@@1@18@@danf@17-8-2009 10191080@unknown@formal@none@1@S@===Object database models===@@@@1@3@@danf@17-8-2009 10191090@unknown@formal@none@1@S@In recent years, the [[object-oriented]] paradigm has been applied to database technology, creating a new programming model known as [[object database]]s.@@@@1@21@@danf@17-8-2009 10191100@unknown@formal@none@1@S@These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same [[type system]] as the application program.@@@@1@31@@danf@17-8-2009 10191110@unknown@formal@none@1@S@This aims to avoid the overhead (sometimes referred to as the ''[[Object-Relational impedance mismatch|impedance mismatch]]'') of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects).@@@@1@40@@danf@17-8-2009 10191120@unknown@formal@none@1@S@At the same time, object databases attempt to introduce the key ideas of object programming, such as [[encapsulation]] and [[polymorphism (computer science)|polymorphism]], into the world of databases.@@@@1@27@@danf@17-8-2009 10191130@unknown@formal@none@1@S@A variety of these ways have been tried for storing objects in a database.@@@@1@14@@danf@17-8-2009 10191140@unknown@formal@none@1@S@Some products have approached the problem from the application programming end, by making the objects manipulated by the program [[Persistence (computer science)|persistent]].@@@@1@22@@danf@17-8-2009 10191150@unknown@formal@none@1@S@This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content.@@@@1@29@@danf@17-8-2009 10191160@unknown@formal@none@1@S@Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.@@@@1@35@@danf@17-8-2009 10191170@unknown@formal@none@1@S@==DBMS internals==@@@@1@2@@danf@17-8-2009 10191180@unknown@formal@none@1@S@===Storage and physical database design===@@@@1@5@@danf@17-8-2009 10191190@unknown@formal@none@1@S@Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered [[flat file database|flat files]], [[ISAM]], [[heap (data structure)|heaps]], [[hash table|hash buckets]] or [[B+ tree]]s.@@@@1@31@@danf@17-8-2009 10191200@unknown@formal@none@1@S@These have various advantages and disadvantages discussed further in the main article on this topic.@@@@1@15@@danf@17-8-2009 10191210@unknown@formal@none@1@S@The most commonly used are B+ trees and ISAM.@@@@1@9@@danf@17-8-2009 10191220@unknown@formal@none@1@S@Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash.@@@@1@33@@danf@17-8-2009 10191230@unknown@formal@none@1@S@As well memory management and storage topology can be important design choices for database designers.@@@@1@15@@danf@17-8-2009 10191240@unknown@formal@none@1@S@Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.@@@@1@31@@danf@17-8-2009 10191250@unknown@formal@none@1@S@====Indexing====@@@@1@1@@danf@17-8-2009 10191260@unknown@formal@none@1@S@All of these databases can take advantage of [[Index (database)|indexing]] to increase their speed.@@@@1@14@@danf@17-8-2009 10191270@unknown@formal@none@1@S@This technology has advanced tremendously since its early uses in the 1960s and 1970s.@@@@1@14@@danf@17-8-2009 10191280@unknown@formal@none@1@S@The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value.@@@@1@27@@danf@17-8-2009 10191290@unknown@formal@none@1@S@An index allows a set of table rows matching some criterion to be located quickly.@@@@1@15@@danf@17-8-2009 10191300@unknown@formal@none@1@S@Typically, indexes are also stored in the various forms of data-structure mentioned above (such as [[B-tree]]s, [[hash table|hash]]es, and [[linked lists]]).@@@@1@21@@danf@17-8-2009 10191310@unknown@formal@none@1@S@Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.@@@@1@23@@danf@17-8-2009 10191320@unknown@formal@none@1@S@Relational DBMS's have the advantage that indexes can be created or dropped without changing existing applications making use of it.@@@@1@20@@danf@17-8-2009 10191330@unknown@formal@none@1@S@The database chooses between many different strategies based on which one it estimates will run the fastest.@@@@1@17@@danf@17-8-2009 10191340@unknown@formal@none@1@S@In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without index to compute the result of an [[SQL]] statement.@@@@1@35@@danf@17-8-2009 10191350@unknown@formal@none@1@S@The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest.@@@@1@28@@danf@17-8-2009 10191360@unknown@formal@none@1@S@Some of the key algorithms that deal with [[join (SQL)|joins]] are [[nested loop join]], [[sort-merge join]] and [[hash join]].@@@@1@19@@danf@17-8-2009 10191370@unknown@formal@none@1@S@Which of these is chosen depends on whether an index exists, what type it is, and its [[Cardinality (SQL statements)|cardinality]].@@@@1@20@@danf@17-8-2009 10191380@unknown@formal@none@1@S@An index speeds up access to data, but it has disadvantages as well.@@@@1@13@@danf@17-8-2009 10191390@unknown@formal@none@1@S@First, every index increases the amount of storage on the hard drive necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time.@@@@1@34@@danf@17-8-2009 10191400@unknown@formal@none@1@S@(Thus an index saves time in the reading of data, but it costs time in entering and altering data.@@@@1@19@@danf@17-8-2009 10191410@unknown@formal@none@1@S@It thus depends on the use to which the data are to be put whether an index is on the whole a net plus or minus in the quest for efficiency.)@@@@1@31@@danf@17-8-2009 10191420@unknown@formal@none@1@S@A special case of an index is a primary index, or primary key, which is distinguished in that the primary index must ensure a unique reference to a record.@@@@1@29@@danf@17-8-2009 10191430@unknown@formal@none@1@S@Often, for this purpose one simply uses a running index number (ID number).@@@@1@13@@danf@17-8-2009 10191440@unknown@formal@none@1@S@Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.@@@@1@18@@danf@17-8-2009 10191450@unknown@formal@none@1@S@===Transactions and concurrency===@@@@1@3@@danf@17-8-2009 10191460@unknown@formal@none@1@S@In addition to their data model, most practical databases ("transactional databases") attempt to enforce a [[database transaction]] .@@@@1@18@@danf@17-8-2009 10191470@unknown@formal@none@1@S@Ideally, the database software should enforce the [[ACID]] rules, summarized here:@@@@1@11@@danf@17-8-2009 10191480@unknown@formal@none@1@S@* [[Atomicity]]: Either all the tasks in a transaction must be done, or none of them.@@@@1@16@@danf@17-8-2009 10191490@unknown@formal@none@1@S@The transaction must be completed, or else it must be undone (rolled back).@@@@1@13@@danf@17-8-2009 10191500@unknown@formal@none@1@S@* [[Database consistency|Consistency]]: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database.@@@@1@19@@danf@17-8-2009 10191510@unknown@formal@none@1@S@It cannot place the data in a contradictory state.@@@@1@9@@danf@17-8-2009 10191520@unknown@formal@none@1@S@* [[Isolation]]: Two simultaneous transactions cannot interfere with one another.@@@@1@10@@danf@17-8-2009 10191530@unknown@formal@none@1@S@Intermediate results within a transaction are not visible to other transactions.@@@@1@11@@danf@17-8-2009 10191540@unknown@formal@none@1@S@* [[Durability (computer science)|Durability]]: Completed transactions cannot be aborted later or their results discarded.@@@@1@14@@danf@17-8-2009 10191550@unknown@formal@none@1@S@They must persist through (for instance) restarts of the DBMS after crashes@@@@1@12@@danf@17-8-2009 10191560@unknown@formal@none@1@S@In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance.@@@@1@16@@danf@17-8-2009 10191570@unknown@formal@none@1@S@[[Concurrency control]] is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules.@@@@1@21@@danf@17-8-2009 10191580@unknown@formal@none@1@S@The DBMS must be able to ensure that only [[serializability|serializable]], [[serializability#correctness - recoverability|recoverable]] schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions .@@@@1@30@@danf@17-8-2009 10191590@unknown@formal@none@1@S@===Replication===@@@@1@1@@danf@17-8-2009 10191600@unknown@formal@none@1@S@Replication of databases is closely related to transactions.@@@@1@8@@danf@17-8-2009 10191610@unknown@formal@none@1@S@If a database can log its individual actions, it is possible to create a duplicate of the data in real time.@@@@1@21@@danf@17-8-2009 10191620@unknown@formal@none@1@S@The duplicate can be used to improve performance or availability of the whole database system.@@@@1@15@@danf@17-8-2009 10191630@unknown@formal@none@1@S@Common replication concepts include:@@@@1@4@@danf@17-8-2009 10191640@unknown@formal@none@1@S@* Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves@@@@1@17@@danf@17-8-2009 10191650@unknown@formal@none@1@S@* Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.@@@@1@17@@danf@17-8-2009 10191660@unknown@formal@none@1@S@* Multimaster: Two or more replicas sync each other via a transaction identifier.@@@@1@13@@danf@17-8-2009 10191670@unknown@formal@none@1@S@Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.@@@@1@27@@danf@17-8-2009 10191680@unknown@formal@none@1@S@===Security===@@@@1@1@@danf@17-8-2009 10191690@unknown@formal@none@1@S@[[Database security]] denotes the system, processes, and procedures that protect a database from unintended activity.@@@@1@15@@danf@17-8-2009 10191700@unknown@formal@none@1@S@Security is usually enforced through '''access control''', '''auditing''', and '''encryption'''.@@@@1@10@@danf@17-8-2009 10191710@unknown@formal@none@1@S@* Access control ensures and restricts who can connect and what can be done to the database.@@@@1@17@@danf@17-8-2009 10191720@unknown@formal@none@1@S@* Auditing logs what action or change has been performed, when and by who.@@@@1@14@@danf@17-8-2009 10191730@unknown@formal@none@1@S@* Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism.@@@@1@20@@danf@17-8-2009 10191740@unknown@formal@none@1@S@Data is encoded natively into the tables and deciphered "on the fly" when a query comes in.@@@@1@17@@danf@17-8-2009 10191745@unknown@formal@none@1@S@Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.@@@@1@17@@danf@17-8-2009 10191750@unknown@formal@none@1@S@Enforcing security is one of the major tasks of the DBA.@@@@1@11@@danf@17-8-2009 10191760@unknown@formal@none@1@S@In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner.@@@@1@25@@danf@17-8-2009 10191770@unknown@formal@none@1@S@United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner.@@@@1@21@@danf@17-8-2009 10191780@unknown@formal@none@1@S@===Locking===@@@@1@1@@danf@17-8-2009 10191790@unknown@formal@none@1@S@[[Lock (computer science)|Locking]] is how the database handle multiple concurent operations.@@@@1@11@@danf@17-8-2009 10191800@unknown@formal@none@1@S@This is the way how concurency and some form of basic intergrity is managed within the database system.@@@@1@18@@danf@17-8-2009 10191810@unknown@formal@none@1@S@Such locks can be applied on a row level, or on other levels like page (a basic data block), extend (multiple array of pages) or even an entire table.@@@@1@29@@danf@17-8-2009 10191820@unknown@formal@none@1@S@This helps maintain the integrity of the data by ensuring that only one process at a time can modify the '''same''' data.@@@@1@22@@danf@17-8-2009 10191830@unknown@formal@none@1@S@Unlike a basic filesystem files or folders, where only one lock at the time can be set, restricting the usage to one process only.@@@@1@24@@danf@17-8-2009 10191840@unknown@formal@none@1@S@A database can set and hold mutiples locks at the same time on the different level of the physical data structure.@@@@1@21@@danf@17-8-2009 10191850@unknown@formal@none@1@S@How locks are set, last is determined by the database engine locking scheme based on the submitted SQL or transactions by the users.@@@@1@23@@danf@17-8-2009 10191860@unknown@formal@none@1@S@Generaly speaking no activity on the database should be translated by no or very light locking.@@@@1@16@@danf@17-8-2009 10191870@unknown@formal@none@1@S@For most DBMS systems existing on the market, locks are generaly '''shared''' or '''exclusive'''.@@@@1@14@@danf@17-8-2009 10191880@unknown@formal@none@1@S@Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts.@@@@1@20@@danf@17-8-2009 10191890@unknown@formal@none@1@S@Exclusive locks are usually set while the database needs to change data, like during an UPDATE or DELETE operation.@@@@1@19@@danf@17-8-2009 10191900@unknown@formal@none@1@S@Shared locks can take ownership one from the other of the current data structure.@@@@1@14@@danf@17-8-2009 10191910@unknown@formal@none@1@S@Shared locks are usually used while the database is reading data, during a SELECT operation.@@@@1@15@@danf@17-8-2009 10191920@unknown@formal@none@1@S@The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances.@@@@1@22@@danf@17-8-2009 10191930@unknown@formal@none@1@S@Bad locking can lead to desastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)@@@@1@20@@danf@17-8-2009 10191940@unknown@formal@none@1@S@Default locking behavior is enforced by the '''isolation level''' of the dataserver.@@@@1@12@@danf@17-8-2009 10191950@unknown@formal@none@1@S@Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system.@@@@1@22@@danf@17-8-2009 10191960@unknown@formal@none@1@S@Default isolation is generaly 1, where data can not be read while it is modfied, forbiding to return "ghost data" to end user.@@@@1@23@@danf@17-8-2009 10191970@unknown@formal@none@1@S@At some point intensive or inappropriate exclusive locking, can lead to the "dead lock" situation between two locks.@@@@1@18@@danf@17-8-2009 10191980@unknown@formal@none@1@S@Where none of the locks can be released because they try to acquire ressources mutually from each other.@@@@1@18@@danf@17-8-2009 10191990@unknown@formal@none@1@S@The Database has a fail safe mecanism and will automaticly "sacrifice" one of the locks releasing the ressource.@@@@1@18@@danf@17-8-2009 10192000@unknown@formal@none@1@S@Doing so processes or transactions involved in the "dead lock" will be rolled back.@@@@1@14@@danf@17-8-2009 10192010@unknown@formal@none@1@S@Databases can also be locked for other reasons, like access restrictions for given levels of user.@@@@1@16@@danf@17-8-2009 10192020@unknown@formal@none@1@S@Databases are also locked for routine database maintenance, which prevents changes being made during the maintenance.@@@@1@16@@danf@17-8-2009 10192030@unknown@formal@none@1@S@See [http://publib.boulder.ibm.com/infocenter/rbhelp/v6r3/index.jsp?topic=/com.ibm.redbrick.doc6.3/wag/wag80.htm IBM] for more detail.)@@@@1@6@@danf@17-8-2009 10192040@unknown@formal@none@1@S@===Architecture===@@@@1@1@@danf@17-8-2009 10192050@unknown@formal@none@1@S@Depending on the intended use, there are a number of database architectures in use.@@@@1@14@@danf@17-8-2009 10192060@unknown@formal@none@1@S@Many databases use a combination of strategies.@@@@1@7@@danf@17-8-2009 10192070@unknown@formal@none@1@S@On-line Transaction Processing systems (OLTP) often use a row-oriented datastore architecture, while data-warehouse and other retrieval-focused applications like [[Google]]'s [[BigTable]], or bibliographic database(library catalogue) systems may use a column-oriented datastore architecture.@@@@1@31@@danf@17-8-2009 10192080@unknown@formal@none@1@S@Document-Oriented, XML, Knowledgebases, as well as frame databases and rdf-stores (aka Triple-Stores), may also use a combination of these architectures in their implementation.@@@@1@23@@danf@17-8-2009 10192090@unknown@formal@none@1@S@Finally it should be noted that not all database have or need a database 'schema' (so called schema-less databases).@@@@1@19@@danf@17-8-2009 10192100@unknown@formal@none@1@S@==Applications of databases==@@@@1@3@@danf@17-8-2009 10192110@unknown@formal@none@1@S@Databases are used in many applications, spanning virtually the entire range of [[computer software]].@@@@1@14@@danf@17-8-2009 10192120@unknown@formal@none@1@S@Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed.@@@@1@18@@danf@17-8-2009 10192130@unknown@formal@none@1@S@Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology.@@@@1@20@@danf@17-8-2009 10192140@unknown@formal@none@1@S@Software database drivers are available for most database platforms so that [[application software]] can use a common [[Application Programming Interface]] to retrieve the information stored in a database.@@@@1@28@@danf@17-8-2009 10192150@unknown@formal@none@1@S@Two commonly used database APIs are [[Java Database Connectivity|JDBC]] and [[ODBC]].@@@@1@11@@danf@17-8-2009 10192160@unknown@formal@none@1@S@For example suppliers database contains the data relating to suppliers such as;@@@@1@12@@danf@17-8-2009 10192170@unknown@formal@none@1@S@*supplier name@@@@1@2@@danf@17-8-2009 10192180@unknown@formal@none@1@S@*supplier code@@@@1@2@@danf@17-8-2009 10192190@unknown@formal@none@1@S@*supplier address@@@@1@2@@danf@17-8-2009 10192200@unknown@formal@none@1@S@It is often used by schools to teach students and grade them.@@@@1@12@@danf@17-8-2009 10192210@unknown@formal@none@1@S@==Links to DBMS products==@@@@1@4@@danf@17-8-2009 10192220@unknown@formal@none@1@S@*[[4th Dimension (Software)|4D]]@@@@1@3@@danf@17-8-2009 10192230@unknown@formal@none@1@S@*[[ADABAS]]@@@@1@1@@danf@17-8-2009 10192240@unknown@formal@none@1@S@*[[Alpha Five]]@@@@1@2@@danf@17-8-2009 10192250@unknown@formal@none@1@S@*[[Apache Derby]] (Java, also known as IBM Cloudscape and Sun Java DB)@@@@1@12@@danf@17-8-2009 10192260@unknown@formal@none@1@S@*[[BerkeleyDB]]@@@@1@1@@danf@17-8-2009 10192270@unknown@formal@none@1@S@*[[CouchDB]]@@@@1@1@@danf@17-8-2009 10192280@unknown@formal@none@1@S@*[[CSQL]]@@@@1@1@@danf@17-8-2009 10192290@unknown@formal@none@1@S@*[[Datawasp]]@@@@1@1@@danf@17-8-2009 10192300@unknown@formal@none@1@S@*[[Db4objects]]@@@@1@1@@danf@17-8-2009 10192310@unknown@formal@none@1@S@*[[dBase]]@@@@1@1@@danf@17-8-2009 10192320@unknown@formal@none@1@S@*[[FileMaker]]@@@@1@1@@danf@17-8-2009 10192330@unknown@formal@none@1@S@*[[Firebird (database server)]]@@@@1@3@@danf@17-8-2009 10192340@unknown@formal@none@1@S@*[[H2 (DBMS)|H2]] (Java)@@@@1@3@@danf@17-8-2009 10192350@unknown@formal@none@1@S@*[[Hsqldb]] (Java)@@@@1@2@@danf@17-8-2009 10192360@unknown@formal@none@1@S@*[[IBM DB2]]@@@@1@2@@danf@17-8-2009 10192370@unknown@formal@none@1@S@*[[Information Management System|IBM IMS (Information Management System)]]@@@@1@7@@danf@17-8-2009 10192380@unknown@formal@none@1@S@*[[IBM UniVerse]]@@@@1@2@@danf@17-8-2009 10192390@unknown@formal@none@1@S@*[[Informix]]@@@@1@1@@danf@17-8-2009 10192400@unknown@formal@none@1@S@*[[Ingres (database)|Ingres]]@@@@1@2@@danf@17-8-2009 10192410@unknown@formal@none@1@S@*[[Interbase]]@@@@1@1@@danf@17-8-2009 10192420@unknown@formal@none@1@S@*[[InterSystems Caché]]@@@@1@2@@danf@17-8-2009 10192430@unknown@formal@none@1@S@*[[MaxDB]] (formerly SapDB)@@@@1@3@@danf@17-8-2009 10192440@unknown@formal@none@1@S@*[[Microsoft Access]]@@@@1@2@@danf@17-8-2009 10192450@unknown@formal@none@1@S@*[[Microsoft SQL Server]]@@@@1@3@@danf@17-8-2009 10192460@unknown@formal@none@1@S@*[[Model 204]]@@@@1@2@@danf@17-8-2009 10192470@unknown@formal@none@1@S@*[[MySQL]]@@@@1@1@@danf@17-8-2009 10192480@unknown@formal@none@1@S@*[[Nomad software|Nomad]]@@@@1@2@@danf@17-8-2009 10192490@unknown@formal@none@1@S@*[[Objectivity/DB]]@@@@1@1@@danf@17-8-2009 10192500@unknown@formal@none@1@S@*[[ObjectStore]]@@@@1@1@@danf@17-8-2009 10192510@unknown@formal@none@1@S@*[[Virtuoso Universal Server|OpenLink Virtuoso]]@@@@1@4@@danf@17-8-2009 10192520@unknown@formal@none@1@S@*[[OpenOffice.org Base]]@@@@1@2@@danf@17-8-2009 10192530@unknown@formal@none@1@S@*[[Oracle Database]]@@@@1@2@@danf@17-8-2009 10192540@unknown@formal@none@1@S@*[[Paradox (database)]]@@@@1@2@@danf@17-8-2009 10192550@unknown@formal@none@1@S@*[[Polyhedra DBMS]]@@@@1@2@@danf@17-8-2009 10192560@unknown@formal@none@1@S@*[[PostgreSQL]]@@@@1@1@@danf@17-8-2009 10192570@unknown@formal@none@1@S@*[[Progress 4GL]]@@@@1@2@@danf@17-8-2009 10192580@unknown@formal@none@1@S@*[[RDM Embedded]]@@@@1@2@@danf@17-8-2009 10192590@unknown@formal@none@1@S@*[[ScimoreDB]]@@@@1@1@@danf@17-8-2009 10192600@unknown@formal@none@1@S@*[[Sedna (database)|Sedna]]@@@@1@2@@danf@17-8-2009 10192610@unknown@formal@none@1@S@*[[SQLite]]@@@@1@1@@danf@17-8-2009 10192620@unknown@formal@none@1@S@*[[Superbase database|Superbase]]@@@@1@2@@danf@17-8-2009 10192630@unknown@formal@none@1@S@*[[Sybase]]@@@@1@1@@danf@17-8-2009 10192640@unknown@formal@none@1@S@*[[Teradata]]@@@@1@1@@danf@17-8-2009 10192650@unknown@formal@none@1@S@*[[Vertica]]@@@@1@1@@danf@17-8-2009 10192660@unknown@formal@none@1@S@*[[Visual FoxPro]]@@@@1@2@@danf@17-8-2009 10200010@unknown@formal@none@1@S@
Cluster analysis
@@@@1@2@@danf@17-8-2009 10200020@unknown@formal@none@1@S@'''Clustering''' is the [[Statistical classification|classification]] of objects into different groups, or more precisely, the [[partition of a set|partitioning]] of a [[data set]] into [[subset]]s (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined [[metric (mathematics)|distance measure]].@@@@1@47@@danf@17-8-2009 10200030@unknown@formal@none@1@S@Data clustering is a common technique for [[statistics|statistical]] [[data analysis]], which is used in many fields, including [[machine learning]], [[data mining]], [[pattern recognition]], [[image analysis]] and [[bioinformatics]].@@@@1@27@@danf@17-8-2009 10200040@unknown@formal@none@1@S@The computational task of classifying the data set into ''k'' clusters is often referred to as '''''k''-clustering'''''.@@@@1@17@@danf@17-8-2009 10200050@unknown@formal@none@1@S@Besides the term ''data clustering'' (or just ''clustering''), there are a number of terms with similar meanings, including ''cluster analysis'', ''automatic classification'', ''numerical taxonomy'', ''botryology'' and ''typological analysis''.@@@@1@28@@danf@17-8-2009 10200060@unknown@formal@none@1@S@== Types of clustering ==@@@@1@5@@danf@17-8-2009 10200070@unknown@formal@none@1@S@Data clustering algorithms can be [[hierarchical]].@@@@1@6@@danf@17-8-2009 10200080@unknown@formal@none@1@S@Hierarchical algorithms find successive clusters using previously established clusters.@@@@1@9@@danf@17-8-2009 10200090@unknown@formal@none@1@S@Hierarchical algorithms can be agglomerative ("bottom-up") or divisive ("top-down").@@@@1@9@@danf@17-8-2009 10200100@unknown@formal@none@1@S@Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters.@@@@1@17@@danf@17-8-2009 10200110@unknown@formal@none@1@S@Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.@@@@1@16@@danf@17-8-2009 10200120@unknown@formal@none@1@S@[[partition of a set|Partitional]] algorithms typically determine all clusters at once, but can also be used as divisive algorithms in the [[hierarchical]] clustering.@@@@1@23@@danf@17-8-2009 10200130@unknown@formal@none@1@S@''Two-way clustering'', ''co-clustering'' or [[biclustering]] are clustering methods where not only the objects are clustered but also the features of the objects, i.e., if the data is represented in a [[data matrix (statistics)|data matrix]], the rows and columns are clustered simultaneously.@@@@1@41@@danf@17-8-2009 10200140@unknown@formal@none@1@S@Another important distinction is whether the clustering uses symmetric or asymmetric distances.@@@@1@12@@danf@17-8-2009 10200150@unknown@formal@none@1@S@A property of [[Euclidean space]] is that distances are symmetric (the distance from object'' A'' to ''B'' is the same as the distance from ''B'' to ''A'').@@@@1@27@@danf@17-8-2009 10200160@unknown@formal@none@1@S@In other applications (e.g., sequence-alignment methods, see Prinzie & Van den Poel (2006)), this is not the case.@@@@1@18@@danf@17-8-2009 10200170@unknown@formal@none@1@S@== Distance measure ==@@@@1@4@@danf@17-8-2009 10200180@unknown@formal@none@1@S@An important step in any clustering is to select a [[Distance|distance measure]], which will determine how the ''similarity'' of two elements is calculated.@@@@1@23@@danf@17-8-2009 10200190@unknown@formal@none@1@S@This will influence the shape of the clusters, as some elements may be close to one another according to one distance and further away according to another.@@@@1@27@@danf@17-8-2009 10200200@unknown@formal@none@1@S@For example, in a 2-dimensional space, the distance between the point (x=1, y=0) and the origin (x=0, y=0) is always 1 according to the usual norms, but the distance between the point (x=1, y=1) and the origin can be 2,\\sqrt 2 or 1 if you take respectively the 1-norm, 2-norm or infinity-norm distance.@@@@1@53@@danf@17-8-2009 10200210@unknown@formal@none@1@S@Common distance functions:@@@@1@3@@danf@17-8-2009 10200220@unknown@formal@none@1@S@* The [[Euclidean distance]] (also called distance [[as the crow flies]] or 2-norm distance).@@@@1@14@@danf@17-8-2009 10200230@unknown@formal@none@1@S@A review of cluster analysis in health psychology research found that the most common distance measure in published studies in that research area is the Euclidean distance or the squared Euclidean distance.@@@@1@32@@danf@17-8-2009 10200240@unknown@formal@none@1@S@* The [[Manhattan distance]] (also called taxicab norm or 1-norm)@@@@1@10@@danf@17-8-2009 10200250@unknown@formal@none@1@S@* The [[Maximum_norm|maximum norm]]@@@@1@4@@danf@17-8-2009 10200260@unknown@formal@none@1@S@* The [[Mahalanobis distance]] corrects data for different scales and correlations in the variables@@@@1@14@@danf@17-8-2009 10200270@unknown@formal@none@1@S@* The angle between two vectors can be used as a distance measure when clustering high dimensional data.@@@@1@18@@danf@17-8-2009 10200280@unknown@formal@none@1@S@See [[Inner product space]].@@@@1@4@@danf@17-8-2009 10200290@unknown@formal@none@1@S@* The [[Hamming distance]] (sometimes edit distance) measures the minimum number of substitutions required to change one member into another.@@@@1@20@@danf@17-8-2009 10200300@unknown@formal@none@1@S@==Hierarchical clustering==@@@@1@2@@danf@17-8-2009 10200310@unknown@formal@none@1@S@===Creating clusters===@@@@1@2@@danf@17-8-2009 10200320@unknown@formal@none@1@S@Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters.@@@@1@12@@danf@17-8-2009 10200330@unknown@formal@none@1@S@The traditional representation of this hierarchy is a [[tree data structure|tree]] (called a [[dendrogram]]), with individual elements at one end and a single cluster containing every element at the other.@@@@1@30@@danf@17-8-2009 10200340@unknown@formal@none@1@S@Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the root.@@@@1@16@@danf@17-8-2009 10200350@unknown@formal@none@1@S@(In the figure, the arrows indicate an agglomerative clustering.)@@@@1@9@@danf@17-8-2009 10200360@unknown@formal@none@1@S@Cutting the tree at a given height will give a clustering at a selected precision.@@@@1@15@@danf@17-8-2009 10200370@unknown@formal@none@1@S@In the following example, cutting after the second row will yield clusters {a} {b c} {d e} {f}.@@@@1@18@@danf@17-8-2009 10200380@unknown@formal@none@1@S@Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger clusters.@@@@1@26@@danf@17-8-2009 10200390@unknown@formal@none@1@S@===Agglomerative hierarchical clustering===@@@@1@3@@danf@17-8-2009 10200400@unknown@formal@none@1@S@For example, suppose this data is to be clustered, and the [[euclidean distance]] is the [[Metric (mathematics)|distance metric]].@@@@1@18@@danf@17-8-2009 10200410@unknown@formal@none@1@S@The hierarchical clustering [[dendrogram]] would be as such:@@@@1@8@@danf@17-8-2009 10200420@unknown@formal@none@1@S@This method builds the hierarchy from the individual elements by progressively merging clusters.@@@@1@13@@danf@17-8-2009 10200430@unknown@formal@none@1@S@In our example, we have six elements {a} {b} {c} {d} {e} and {f}.@@@@1@14@@danf@17-8-2009 10200440@unknown@formal@none@1@S@The first step is to determine which elements to merge in a cluster.@@@@1@13@@danf@17-8-2009 10200450@unknown@formal@none@1@S@Usually, we want to take the two closest elements, according to the chosen distance.@@@@1@14@@danf@17-8-2009 10200460@unknown@formal@none@1@S@Optionally, one can also construct a [[distance matrix]] at this stage, where the number in the ''i''-th row ''j''-th column is the distance between the ''i''-th and ''j''-th elements.@@@@1@29@@danf@17-8-2009 10200470@unknown@formal@none@1@S@Then, as clustering progresses, rows and columns are merged as the clusters are merged and the distances updated.@@@@1@18@@danf@17-8-2009 10200480@unknown@formal@none@1@S@This is a common way to implement this type of clustering, and has the benefit of caching distances between clusters.@@@@1@20@@danf@17-8-2009 10200490@unknown@formal@none@1@S@A simple agglomerative clustering algorithm is described in the [[single linkage clustering]] page; it can easily be adapted to different types of linkage (see below).@@@@1@25@@danf@17-8-2009 10200500@unknown@formal@none@1@S@Suppose we have merged the two closest elements ''b'' and ''c'', we now have the following clusters {''a''}, {''b'', ''c''}, {''d''}, {''e''} and {''f''}, and want to merge them further.@@@@1@30@@danf@17-8-2009 10200510@unknown@formal@none@1@S@To do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters.@@@@1@22@@danf@17-8-2009 10200520@unknown@formal@none@1@S@Usually the distance between two clusters \\mathcal{A} and \\mathcal{B} is one of the following:@@@@1@14@@danf@17-8-2009 10200530@unknown@formal@none@1@S@* The maximum distance between elements of each cluster (also called complete linkage clustering):@@@@1@14@@danf@17-8-2009 10200540@unknown@formal@none@1@S@:: \\max \\{\\, d(x,y) : x \\in \\mathcal{A},\\, y \\in \\mathcal{B}\\,\\} @@@@1@12@@danf@17-8-2009 10200550@unknown@formal@none@1@S@* The minimum distance between elements of each cluster (also called [[single linkage clustering]]):@@@@1@14@@danf@17-8-2009 10200560@unknown@formal@none@1@S@:: \\min \\{\\, d(x,y) : x \\in \\mathcal{A},\\, y \\in \\mathcal{B} \\,\\} @@@@1@13@@danf@17-8-2009 10200570@unknown@formal@none@1@S@* The mean distance between elements of each cluster (also called average linkage clustering, used e.g. in [[UPGMA]]):@@@@1@18@@danf@17-8-2009 10200580@unknown@formal@none@1@S@:: {1 \\over {|\\mathcal{A}|\\cdot|\\mathcal{B}|}}\\sum_{x \\in \\mathcal{A}}\\sum_{ y \\in \\mathcal{B}} d(x,y) @@@@1@11@@danf@17-8-2009 10200590@unknown@formal@none@1@S@* The sum of all intra-cluster variance@@@@1@7@@danf@17-8-2009 10200600@unknown@formal@none@1@S@* The increase in variance for the cluster being merged ([[Ward's criterion]])@@@@1@12@@danf@17-8-2009 10200610@unknown@formal@none@1@S@* The probability that candidate clusters spawn from the same distribution function (V-linkage)@@@@1@13@@danf@17-8-2009 10200620@unknown@formal@none@1@S@Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).@@@@1@45@@danf@17-8-2009 10200630@unknown@formal@none@1@S@=== Concept clustering ===@@@@1@4@@danf@17-8-2009 10200640@unknown@formal@none@1@S@Another variation of the agglomerative clustering approach is [[conceptual clustering]].@@@@1@10@@danf@17-8-2009 10200650@unknown@formal@none@1@S@==Partitional clustering==@@@@1@2@@danf@17-8-2009 10200660@unknown@formal@none@1@S@===''K''-means and derivatives===@@@@1@3@@danf@17-8-2009 10200670@unknown@formal@none@1@S@====''K''-means clustering====@@@@1@2@@danf@17-8-2009 10200680@unknown@formal@none@1@S@The [[K-means algorithm|''K''-means algorithm]] assigns each point to the cluster whose center (also called centroid) is nearest.@@@@1@17@@danf@17-8-2009 10200690@unknown@formal@none@1@S@The center is the average of all the points in the cluster — that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster...@@@@1@32@@danf@17-8-2009 10200700@unknown@formal@none@1@S@:''Example:'' The data set has three dimensions and the cluster has two points: ''X'' = (''x''1, ''x''2, ''x''3) and ''Y'' = (''y''1, ''y''2, ''y''3).@@@@1@24@@danf@17-8-2009 10200710@unknown@formal@none@1@S@Then the centroid ''Z'' becomes ''Z'' = (''z''1, ''z''2, ''z''3), where ''z''1 = (''x''1 + ''y''1)/2 and ''z''2 = (''x''2 + ''y''2)/2 and ''z''3 = (''x''3 + ''y''3)/2.@@@@1@22@@danf@17-8-2009 10200720@unknown@formal@none@1@S@The algorithm steps are (J. MacQueen, 1967):@@@@1@7@@danf@17-8-2009 10200730@unknown@formal@none@1@S@* Choose the number of clusters, ''k''.@@@@1@7@@danf@17-8-2009 10200740@unknown@formal@none@1@S@* Randomly generate ''k'' clusters and determine the cluster centers, or directly generate ''k'' random points as cluster centers.@@@@1@19@@danf@17-8-2009 10200750@unknown@formal@none@1@S@* Assign each point to the nearest cluster center.@@@@1@9@@danf@17-8-2009 10200760@unknown@formal@none@1@S@* Recompute the new cluster centers.@@@@1@6@@danf@17-8-2009 10200770@unknown@formal@none@1@S@* Repeat the two previous steps until some convergence criterion is met (usually that the assignment hasn't changed).@@@@1@18@@danf@17-8-2009 10200780@unknown@formal@none@1@S@The main advantages of this algorithm are its simplicity and speed which allows it to run on large datasets.@@@@1@19@@danf@17-8-2009 10200790@unknown@formal@none@1@S@Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments.@@@@1@24@@danf@17-8-2009 10200800@unknown@formal@none@1@S@It minimizes intra-cluster variance, but does not ensure that the result has a global minimum of variance.@@@@1@17@@danf@17-8-2009 10200810@unknown@formal@none@1@S@====Fuzzy ''c''-means clustering====@@@@1@3@@danf@17-8-2009 10200820@unknown@formal@none@1@S@In [[fuzzy clustering]], each point has a degree of belonging to clusters, as in [[fuzzy logic]], rather than belonging completely to just one cluster.@@@@1@24@@danf@17-8-2009 10200830@unknown@formal@none@1@S@Thus, points on the edge of a cluster, may be ''in the cluster'' to a lesser degree than points in the center of cluster.@@@@1@24@@danf@17-8-2009 10200840@unknown@formal@none@1@S@For each point ''x'' we have a coefficient giving the degree of being in the ''k''th cluster u_k(x).@@@@1@18@@danf@17-8-2009 10200850@unknown@formal@none@1@S@Usually, the sum of those coefficients is defined to be 1:@@@@1@11@@danf@17-8-2009 10200860@unknown@formal@none@1@S@: \\forall x \\sum_{k=1}^{\\mathrm{num.}\\ \\mathrm{clusters}} u_k(x) \\ =1.@@@@1@8@@danf@17-8-2009 10200870@unknown@formal@none@1@S@With fuzzy ''c''-means, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster:@@@@1@23@@danf@17-8-2009 10200880@unknown@formal@none@1@S@:\\mathrm{center}_k = {{\\sum_x u_k(x)^m x} \\over {\\sum_x u_k(x)^m}}.@@@@1@8@@danf@17-8-2009 10200890@unknown@formal@none@1@S@The degree of belonging is related to the inverse of the distance to the cluster@@@@1@15@@danf@17-8-2009 10200900@unknown@formal@none@1@S@:u_k(x) = {1 \\over d(\\mathrm{center}_k,x)},@@@@1@5@@danf@17-8-2009 10200910@unknown@formal@none@1@S@then the coefficients are normalized and fuzzyfied with a real parameter m>1 so that their sum is 1.@@@@1@18@@danf@17-8-2009 10200920@unknown@formal@none@1@S@So@@@@1@1@@danf@17-8-2009 10200930@unknown@formal@none@1@S@:u_k(x) = \\frac{1}{\\sum_j \\left(\\frac{d(\\mathrm{center}_k,x)}{d(\\mathrm{center}_j,x)}\\right)^{2/(m-1)}}.@@@@1@4@@danf@17-8-2009 10200940@unknown@formal@none@1@S@For ''m'' equal to 2, this is equivalent to normalising the coefficient linearly to make their sum 1.@@@@1@18@@danf@17-8-2009 10200950@unknown@formal@none@1@S@When ''m'' is close to 1, then cluster center closest to the point is given much more weight than the others, and the algorithm is similar to ''k''-means.@@@@1@28@@danf@17-8-2009 10200960@unknown@formal@none@1@S@The fuzzy ''c''-means algorithm is very similar to the ''k''-means algorithm:@@@@1@11@@danf@17-8-2009 10200970@unknown@formal@none@1@S@* Choose a number of clusters.@@@@1@6@@danf@17-8-2009 10200980@unknown@formal@none@1@S@* Assign randomly to each point coefficients for being in the clusters.@@@@1@12@@danf@17-8-2009 10200990@unknown@formal@none@1@S@* Repeat until the algorithm has converged (that is, the coefficients' change between two iterations is no more than \\epsilon, the given sensitivity threshold) :@@@@1@25@@danf@17-8-2009 10201000@unknown@formal@none@1@S@** Compute the centroid for each cluster, using the formula above.@@@@1@11@@danf@17-8-2009 10201010@unknown@formal@none@1@S@** For each point, compute its coefficients of being in the clusters, using the formula above.@@@@1@16@@danf@17-8-2009 10201020@unknown@formal@none@1@S@The algorithm minimizes intra-cluster variance as well, but has the same problems as ''k''-means, the minimum is a local minimum, and the results depend on the initial choice of weights.@@@@1@30@@danf@17-8-2009 10201030@unknown@formal@none@1@S@The [[Expectation-maximization algorithm]] is a more statistically formalized method which includes some of these ideas: partial membership in classes.@@@@1@19@@danf@17-8-2009 10201040@unknown@formal@none@1@S@It has better convergence properties and is in general preferred to fuzzy-c-means.@@@@1@12@@danf@17-8-2009 10201050@unknown@formal@none@1@S@====QT clustering algorithm====@@@@1@3@@danf@17-8-2009 10201060@unknown@formal@none@1@S@QT (quality threshold) clustering (Heyer et al, 1999) is an alternative method of partitioning data, invented for gene clustering.@@@@1@19@@danf@17-8-2009 10201070@unknown@formal@none@1@S@It requires more computing power than ''k''-means, but does not require specifying the number of clusters ''a priori'', and always returns the same result when run several times.@@@@1@28@@danf@17-8-2009 10201080@unknown@formal@none@1@S@The algorithm is:@@@@1@3@@danf@17-8-2009 10201090@unknown@formal@none@1@S@* The user chooses a maximum diameter for clusters.@@@@1@9@@danf@17-8-2009 10201100@unknown@formal@none@1@S@* Build a candidate cluster for each point by including the closest point, the next closest, and so on, until the diameter of the cluster surpasses the threshold.@@@@1@28@@danf@17-8-2009 10201110@unknown@formal@none@1@S@* Save the candidate cluster with the most points as the first true cluster, and remove all points in the cluster from further consideration.@@@@1@24@@danf@17-8-2009 10201120@unknown@formal@none@1@S@Must clarify what happens if more than 1 cluster has the maximum number of points ?@@@@1@16@@danf@17-8-2009 10201130@unknown@formal@none@1@S@* [[Recursion|Recurse]] with the reduced set of points.@@@@1@8@@danf@17-8-2009 10201140@unknown@formal@none@1@S@The distance between a point and a group of points is computed using complete linkage, i.e. as the maximum distance from the point to any member of the group (see the "Agglomerative hierarchical clustering" section about distance between clusters).@@@@1@39@@danf@17-8-2009 10201150@unknown@formal@none@1@S@=== Locality-sensitive hashing ===@@@@1@4@@danf@17-8-2009 10201160@unknown@formal@none@1@S@[[Locality-sensitive hashing]] can be used for clustering.@@@@1@7@@danf@17-8-2009 10201170@unknown@formal@none@1@S@Feature space vectors are sets, and the metric used is the [[Jaccard distance]].@@@@1@13@@danf@17-8-2009 10201180@unknown@formal@none@1@S@The feature space can be considered high-dimensional.@@@@1@7@@danf@17-8-2009 10201190@unknown@formal@none@1@S@The ''min-wise independent permutations'' LSH scheme (sometimes MinHash) is then used to put similar items into buckets.@@@@1@17@@danf@17-8-2009 10201200@unknown@formal@none@1@S@With just one set of hashing methods, there are only clusters of very similar elements.@@@@1@15@@danf@17-8-2009 10201210@unknown@formal@none@1@S@By seeding the hash functions several times (eg 20), it is possible to get bigger clusters.@@@@1@16@@danf@17-8-2009 10201220@unknown@formal@none@1@S@=== Graph-theoretic methods ===@@@@1@4@@danf@17-8-2009 10201230@unknown@formal@none@1@S@[[Formal concept analysis]] is a technique for generating clusters of objects and attributes, given a [[bipartite graph]] representing the relations between the objects and attributes.@@@@1@25@@danf@17-8-2009 10201240@unknown@formal@none@1@S@Other methods for generating ''overlapping clusters'' (a [[Cover (topology)|cover]] rather than a [[partition of a set|partition]]) are discussed by Jardine and Sibson (1968) and Cole and Wishart (1970).@@@@1@28@@danf@17-8-2009 10201250@unknown@formal@none@1@S@== Elbow criterion ==@@@@1@4@@danf@17-8-2009 10201260@unknown@formal@none@1@S@The elbow criterion is a common [[rule of thumb]] to determine what number of clusters should be chosen, for example for ''k''-means and agglomerative hierarchical clustering.@@@@1@26@@danf@17-8-2009 10201270@unknown@formal@none@1@S@It should also be noted that the initial assignment of cluster seeds has bearing on the final model performance.@@@@1@19@@danf@17-8-2009 10201280@unknown@formal@none@1@S@Thus, it is appropriate to re-run the cluster analysis multiple times.@@@@1@11@@danf@17-8-2009 10201290@unknown@formal@none@1@S@The elbow criterion says that you should choose a number of clusters so that adding another cluster doesn't add sufficient information.@@@@1@21@@danf@17-8-2009 10201300@unknown@formal@none@1@S@More precisely, if you graph the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph (the elbow).@@@@1@47@@danf@17-8-2009 10201310@unknown@formal@none@1@S@This elbow cannot always be unambiguously identified.@@@@1@7@@danf@17-8-2009 10201320@unknown@formal@none@1@S@Percentage of variance explained is the ratio of the between-group variance to the total variance.@@@@1@15@@danf@17-8-2009 10201330@unknown@formal@none@1@S@On the following graph, the elbow is indicated by the red circle.@@@@1@12@@danf@17-8-2009 10201340@unknown@formal@none@1@S@The number of clusters chosen should therefore be 4.@@@@1@9@@danf@17-8-2009 10201350@unknown@formal@none@1@S@== Spectral clustering ==@@@@1@4@@danf@17-8-2009 10201360@unknown@formal@none@1@S@Given a set of data points A, the [[similarity matrix]] may be defined as a matrix S where S_{ij} represents a measure of the similarity between points i, j\\in A.@@@@1@30@@danf@17-8-2009 10201370@unknown@formal@none@1@S@Spectral clustering techniques make use of the [[Spectrum of a matrix|spectrum]] of the similarity matrix of the data to perform [[dimensionality reduction]] for clustering in fewer dimensions.@@@@1@27@@danf@17-8-2009 10201380@unknown@formal@none@1@S@One such technique is the ''[[Shi-Malik algorithm]]'', commonly used for [[segmentation (image processing)|image segmentation]].@@@@1@14@@danf@17-8-2009 10201390@unknown@formal@none@1@S@It partitions points into two sets (S_1,S_2) based on the [[eigenvector]] v corresponding to the second-smallest [[eigenvalue]] of the [[Laplacian matrix]]@@@@1@21@@danf@17-8-2009 10201400@unknown@formal@none@1@S@:L = I - D^{-1/2}SD^{-1/2}@@@@1@5@@danf@17-8-2009 10201410@unknown@formal@none@1@S@of S, where D is the diagonal matrix@@@@1@8@@danf@17-8-2009 10201420@unknown@formal@none@1@S@:D_{ii} = \\sum_{j} S_{ij}.@@@@1@4@@danf@17-8-2009 10201430@unknown@formal@none@1@S@This partitioning may be done in various ways, such as by taking the median m of the components in v, and placing all points whose component in v is greater than m in S_1, and the rest in S_2.@@@@1@39@@danf@17-8-2009 10201440@unknown@formal@none@1@S@The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.@@@@1@16@@danf@17-8-2009 10201450@unknown@formal@none@1@S@A related algorithm is the ''[[Meila-Shi algorithm]]'', which takes the [[eigenvector]]s corresponding to the ''k'' largest [[eigenvalue]]s of the matrix P = SD^{-1} for some ''k'', and then invokes another (e.g. ''k''-means) to cluster points by their respective ''k'' components in these eigenvectors.@@@@1@43@@danf@17-8-2009 10201460@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10201470@unknown@formal@none@1@S@=== Biology ===@@@@1@3@@danf@17-8-2009 10201480@unknown@formal@none@1@S@In [[biology]] '''clustering''' has many applications@@@@1@6@@danf@17-8-2009 10201490@unknown@formal@none@1@S@*In imaging, data clustering may take different form based on the data dimensionality.@@@@1@13@@danf@17-8-2009 10201500@unknown@formal@none@1@S@For example, the [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_2D_PointSegmentation_EM_Mixture SOCR EM Mixture model segmentation activity and applet] shows how to obtain point, region or volume classification using the online [[SOCR]] computational libraries.@@@@1@27@@danf@17-8-2009 10201510@unknown@formal@none@1@S@*In the fields of [[plant]] and [[animal]] [[ecology]], clustering is used to describe and to make spatial and temporal comparisons of communities (assemblages) of organisms in heterogeneous environments; it is also used in [[Systematics|plant systematics]] to generate artificial [[Phylogeny|phylogenies]] or clusters of organisms (individuals) at the species, genus or higher level that share a number of attributes@@@@1@57@@danf@17-8-2009 10201520@unknown@formal@none@1@S@*In computational biology and [[bioinformatics]]:@@@@1@5@@danf@17-8-2009 10201530@unknown@formal@none@1@S@** In [[transcriptome|transcriptomics]], clustering is used to build groups of [[genes]] with related expression patterns (also known as coexpressed genes).@@@@1@20@@danf@17-8-2009 10201540@unknown@formal@none@1@S@Often such groups contain functionally related proteins, such as [[enzyme]]s for a specific [[metabolic pathway|pathway]], or genes that are co-regulated.@@@@1@20@@danf@17-8-2009 10201550@unknown@formal@none@1@S@High throughput experiments using [[expressed sequence tag]]s (ESTs) or [[DNA microarray]]s can be a powerful tool for [[genome annotation]], a general aspect of [[genomics]].@@@@1@24@@danf@17-8-2009 10201560@unknown@formal@none@1@S@** In [[sequence analysis]], clustering is used to group homologous sequences into [[list of gene families|gene families]].@@@@1@17@@danf@17-8-2009 10201570@unknown@formal@none@1@S@This is a very important concept in bioinformatics, and [[evolutionary biology]] in general.@@@@1@13@@danf@17-8-2009 10201580@unknown@formal@none@1@S@See evolution by [[gene duplication]].@@@@1@5@@danf@17-8-2009 10201590@unknown@formal@none@1@S@** In high-throughput genotyping platforms clustering algorithms are used to automatically assign [[genotypes]].@@@@1@13@@danf@17-8-2009 10201600@unknown@formal@none@1@S@=== Medicine ===@@@@1@3@@danf@17-8-2009 10201610@unknown@formal@none@1@S@In [[medical imaging]], such as [[PET scan|PET scans]], cluster analysis can be used to differentiate between different types of [[tissue (biology)|tissue]] and [[blood]] in a three dimensional image.@@@@1@28@@danf@17-8-2009 10201620@unknown@formal@none@1@S@In this application, actual position does not matter, but the [[voxel]] intensity is considered as a [[coordinate vector|vector]], with a dimension for each image that was taken over time.@@@@1@29@@danf@17-8-2009 10201630@unknown@formal@none@1@S@This technique allows, for example, accurate measurement of the rate a radioactive tracer is delivered to the area of interest, without a separate sampling of [[arterial]] blood, an intrusive technique that is most common today.@@@@1@35@@danf@17-8-2009 10201640@unknown@formal@none@1@S@=== Market research ===@@@@1@4@@danf@17-8-2009 10201650@unknown@formal@none@1@S@Cluster analysis is widely used in [[market research]] when working with multivariate data from [[Statistical survey|surveys]] and test panels.@@@@1@19@@danf@17-8-2009 10201660@unknown@formal@none@1@S@Market researchers use cluster analysis to partition the general [[population]] of [[consumers]] into market segments and to better understand the relationships between different groups of consumers/potential [[customers]].@@@@1@27@@danf@17-8-2009 10201670@unknown@formal@none@1@S@* Segmenting the market and determining [[target market]]s@@@@1@8@@danf@17-8-2009 10201680@unknown@formal@none@1@S@* [[positioning (marketing)|Product positioning]]@@@@1@4@@danf@17-8-2009 10201690@unknown@formal@none@1@S@* [[New product development]]@@@@1@4@@danf@17-8-2009 10201700@unknown@formal@none@1@S@* Selecting test markets (see : [[experimental techniques]])@@@@1@8@@danf@17-8-2009 10201710@unknown@formal@none@1@S@=== Other applications ===@@@@1@4@@danf@17-8-2009 10201720@unknown@formal@none@1@S@'''Social network analysis''': In the study of [[social networks]], clustering may be used to recognize [[communities]] within large groups of people.@@@@1@21@@danf@17-8-2009 10201730@unknown@formal@none@1@S@'''Image segmentation''': Clustering can be used to divide a [[digital]] [[image]] into distinct regions for [[border detection]] or [[object recognition]].@@@@1@20@@danf@17-8-2009 10201740@unknown@formal@none@1@S@'''Data mining''': Many [[data mining]] applications involve partitioning data items into related subsets; the marketing applications discussed above represent some examples.@@@@1@21@@danf@17-8-2009 10201750@unknown@formal@none@1@S@Another common application is the division of documents, such as [[World Wide Web]] pages, into genres.@@@@1@16@@danf@17-8-2009 10201760@unknown@formal@none@1@S@'''Search result grouping''': In the process of intelligent grouping of the files and websites, clustering may be used to create a more relevant set of search results compared to normal search engines like [[Google]].@@@@1@34@@danf@17-8-2009 10201770@unknown@formal@none@1@S@There are currently a number of web based clustering tools such as [[Clusty]].@@@@1@13@@danf@17-8-2009 10201780@unknown@formal@none@1@S@'''Slippy map optimization''': [[Flickr]]'s map of photos and other map sites use clustering to reduce the number of markers on a map.@@@@1@22@@danf@17-8-2009 10201790@unknown@formal@none@1@S@This makes it both faster and reduces the amount of visual clutter.@@@@1@12@@danf@17-8-2009 10201800@unknown@formal@none@1@S@'''IMRT segmentation''': Clustering can be used to divide a fluence map into distinct regions for conversion into deliverable fields in MLC-based Radiation Therapy.@@@@1@23@@danf@17-8-2009 10201810@unknown@formal@none@1@S@'''Grouping of Shopping Items''': Clustering can be used to group all the shopping items available on the web into a set of unique products.@@@@1@24@@danf@17-8-2009 10201820@unknown@formal@none@1@S@For example, all the items on eBay can be grouped into unique products.@@@@1@13@@danf@17-8-2009 10201825@unknown@formal@none@1@S@(eBay doesn't have the concept of a SKU)@@@@1@8@@danf@17-8-2009 10201830@unknown@formal@none@1@S@'''[[Mathematical chemistry]]''': To find structural similarity, etc., for example, 3000 chemical compounds were clustered in the space of 90 [[topological index|topological indices]].@@@@1@22@@danf@17-8-2009 10201840@unknown@formal@none@1@S@'''Petroleum Geology''': Cluster Analysis is used to reconstruct missing bottom hole core data or missing log curves in order to evaluate reservoir properties.@@@@1@23@@danf@17-8-2009 10201850@unknown@formal@none@1@S@== Comparisons between data clusterings ==@@@@1@6@@danf@17-8-2009 10201860@unknown@formal@none@1@S@There have been several suggestions for a measure of similarity between two clusterings.@@@@1@13@@danf@17-8-2009 10201870@unknown@formal@none@1@S@Such a measure can be used to compare how well different data clustering algorithms perform on a set of data.@@@@1@20@@danf@17-8-2009 10201880@unknown@formal@none@1@S@Many of these measures are derived from the [[matching matrix]] (aka [[confusion matrix]]), e.g., the [[Rand index|Rand measure]] and the Fowlkes-Mallows ''B''''k'' measures.@@@@1@23@@danf@17-8-2009 10201890@unknown@formal@none@1@S@[[Marina Meila]]'s Variation of Information metric is a more recent approach for measuring distance between clusterings.@@@@1@16@@danf@17-8-2009 10201900@unknown@formal@none@1@S@It uses [[Mutual information|mutual information]] and [[entropy]] to approximate the distance between two clusterings across the lattice of possible clusterings.@@@@1@20@@danf@17-8-2009 10201910@unknown@formal@none@1@S@==Algorithms==@@@@1@1@@danf@17-8-2009 10201920@unknown@formal@none@1@S@In recent years considerable effort has been put into improving algorithm performance (Z. Huang, 1998).@@@@1@15@@danf@17-8-2009 10201930@unknown@formal@none@1@S@Among the most popular are ''CLARANS'' (Ng and Han,1994), ''[[DBSCAN]]'' (Ester et al., 1996) and ''BIRCH'' (Zhang et al., 1996).@@@@1@20@@danf@17-8-2009 10210010@unknown@formal@none@1@S@
Data mining
@@@@1@2@@danf@17-8-2009 10210020@unknown@formal@none@1@S@'''Data mining''' is the process of [[sorting]] through large amounts of data and picking out relevant information.@@@@1@17@@danf@17-8-2009 10210030@unknown@formal@none@1@S@It is usually used by [[business intelligence]] organizations, and [[financial analyst]]s, but is increasingly being used in the sciences to extract information from the enormous [[data set]]s generated by modern experimental and observational methods.@@@@1@34@@danf@17-8-2009 10210040@unknown@formal@none@1@S@It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful [[information]] from [[data]]" and "the science of extracting useful information from large [[data set]]s or [[database]]s.@@@@1@31@@danf@17-8-2009 10210050@unknown@formal@none@1@S@" Data mining in relation to [[enterprise resource planning]] is the statistical and logical analysis of large sets of transaction data, looking for patterns that can aid decision making.@@@@1@29@@danf@17-8-2009 10210060@unknown@formal@none@1@S@==Background==@@@@1@1@@danf@17-8-2009 10210070@unknown@formal@none@1@S@Traditionally, business analysts have performed the task of extracting useful [[information]] from recorded [[data]], but the increasing volume of data in modern business and science calls for computer-based approaches.@@@@1@29@@danf@17-8-2009 10210080@unknown@formal@none@1@S@As [[data set]]s have grown in size and complexity, there has been a shift away from direct hands-on data analysis toward indirect, automatic data analysis using more complex and sophisticated tools.@@@@1@31@@danf@17-8-2009 10210090@unknown@formal@none@1@S@The modern technologies of [[computers]], [[networks]], and [[sensors]] have made [[data collection]] and organization much easier.@@@@1@16@@danf@17-8-2009 10210100@unknown@formal@none@1@S@However, the captured data needs to be converted into [[information]] and [[knowledge]] to become useful.@@@@1@15@@danf@17-8-2009 10210110@unknown@formal@none@1@S@Data mining is the entire process of applying computer-based [[methodology]], including new techniques for [[knowledge discovery]], to data.@@@@1@18@@danf@17-8-2009 10210120@unknown@formal@none@1@S@Data mining identifies trends within data that go beyond simple analysis.@@@@1@11@@danf@17-8-2009 10210130@unknown@formal@none@1@S@Through the use of sophisticated algorithms, non-statistician users have the opportunity to identify key attributes of business processes and target opportunities.@@@@1@21@@danf@17-8-2009 10210140@unknown@formal@none@1@S@However, abdicating control of this process from the statistician to the machine may result in false-positives or no useful results at all.@@@@1@22@@danf@17-8-2009 10210150@unknown@formal@none@1@S@Although data mining is a relatively new term, the technology is not.@@@@1@12@@danf@17-8-2009 10210160@unknown@formal@none@1@S@For many years, businesses have used powerful computers to sift through volumes of data such as supermarket scanner data to produce market research reports (although reporting is not considered to be data mining).@@@@1@33@@danf@17-8-2009 10210170@unknown@formal@none@1@S@Continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy and usefulness of data analysis.@@@@1@21@@danf@17-8-2009 10210180@unknown@formal@none@1@S@Web 2.0 technologies have generated a colossal amount of user-generated data and media, making it hard to aggregate and consume information in a meaningful way without getting overloaded.@@@@1@28@@danf@17-8-2009 10210190@unknown@formal@none@1@S@Given the size of the data on the Internet, and the difficulty in contextualizing it, it is unclear whether the traditional approach to data mining is computationally viable.@@@@1@28@@danf@17-8-2009 10210200@unknown@formal@none@1@S@The term data mining is often used to apply to the two separate processes of knowledge discovery and [[prediction]].@@@@1@19@@danf@17-8-2009 10210210@unknown@formal@none@1@S@Knowledge discovery provides explicit information that has a readable form and can be understood by a user.@@@@1@17@@danf@17-8-2009 10210220@unknown@formal@none@1@S@[[Forecasting]], or [[predictive modeling]] provides predictions of future events and may be transparent and readable in some approaches (e.g., rule-based systems) and opaque in others such as [[neural network]]s.@@@@1@29@@danf@17-8-2009 10210230@unknown@formal@none@1@S@Moreover, some data-mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.@@@@1@20@@danf@17-8-2009 10210240@unknown@formal@none@1@S@[[Metadata]], or data about a given data set, are often expressed in a condensed ''data-minable'' format, or one that facilitates the practice of data mining.@@@@1@25@@danf@17-8-2009 10210250@unknown@formal@none@1@S@Common examples include executive summaries and scientific abstracts.@@@@1@8@@danf@17-8-2009 10210260@unknown@formal@none@1@S@Data mining relies on the use of real world data.@@@@1@10@@danf@17-8-2009 10210270@unknown@formal@none@1@S@This data is extremely vulnerable to [[collinearity]] precisely because data from the real world may have unknown interrelations.@@@@1@18@@danf@17-8-2009 10210280@unknown@formal@none@1@S@An unavoidable weakness of data mining is that the critical data that may expose any relationship might have never been observed.@@@@1@21@@danf@17-8-2009 10210290@unknown@formal@none@1@S@Alternative approaches using an experiment-based approach such as [[Choice Modelling]] for human-generated data may be used.@@@@1@16@@danf@17-8-2009 10210300@unknown@formal@none@1@S@Inherent correlations are either controlled for or removed altogether through the construction of an [[experimental design]].@@@@1@16@@danf@17-8-2009 10210310@unknown@formal@none@1@S@Recently, there were some efforts to define a standard for data mining, for example the [[CRISP-DM]] standard for analysis processes or the [[Java Data-Mining]] Standard.@@@@1@25@@danf@17-8-2009 10210320@unknown@formal@none@1@S@Independent of these standardization efforts, freely available open-source software systems like [[RapidMiner]] and [[Weka (machine learning)| Weka]] have become an informal standard for defining data-mining processes.@@@@1@26@@danf@17-8-2009 10210330@unknown@formal@none@1@S@==Privacy concerns==@@@@1@2@@danf@17-8-2009 10210340@unknown@formal@none@1@S@There are also [[privacy]] and [[human rights]] concerns associated with data mining, specifically regarding the source of the data analyzed.@@@@1@20@@danf@17-8-2009 10210350@unknown@formal@none@1@S@Data mining provides information that may be difficult to obtain otherwise.@@@@1@11@@danf@17-8-2009 10210360@unknown@formal@none@1@S@When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics.@@@@1@16@@danf@17-8-2009 10210370@unknown@formal@none@1@S@In particular, data mining government or commercial data sets for national security or law enforcement purposes has raised privacy concerns.@@@@1@20@@danf@17-8-2009 10210380@unknown@formal@none@1@S@==Notable uses of data mining==@@@@1@5@@danf@17-8-2009 10210390@unknown@formal@none@1@S@===Combatting Terrorism===@@@@1@2@@danf@17-8-2009 10210400@unknown@formal@none@1@S@Data mining has been cited as the method by which the U.S. Army unit [[Able Danger]] had identified the [[September 11, 2001 attacks]] leader, [[Mohamed Atta]], and three other 9/11 hijackers as possible members of an [[Al Qaeda]] cell operating in the U.S. more than a year before the attack.@@@@1@50@@danf@17-8-2009 10210410@unknown@formal@none@1@S@It has been suggested that both the [[Central Intelligence Agency]] and the [[Canadian Security Intelligence Service]] have employed this method.@@@@1@20@@danf@17-8-2009 10210420@unknown@formal@none@1@S@Previous data mining to stop terrorist programs under the US government include the Terrorism Information Awareness (TIA) program, Computer-Assisted Passenger Prescreening System (CAPPS II), Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE), Multistate Anti-Terrorism Information Exchange (MATRIX), and the Secure Flight program [http://www.msnbc.msn.com/id/20604775/ Security-MSNBC].@@@@1@44@@danf@17-8-2009 10210430@unknown@formal@none@1@S@These programs have been discontinued due to controversy over whether they violate the US Constitution's 4th amendment.@@@@1@17@@danf@17-8-2009 10210440@unknown@formal@none@1@S@===Games===@@@@1@1@@danf@17-8-2009 10210450@unknown@formal@none@1@S@Since the early 1960s, with the availability of [[Oracle machine|oracle]]s for certain [[combinatorial game]]s, also called [[tablebase]]s (e.g. for 3x3-chess) with any beginning configuration, small-board [[dots-and-boxes]], small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining has been opened up.@@@@1@45@@danf@17-8-2009 10210460@unknown@formal@none@1@S@This is the extraction of human-usable strategies from these oracles.@@@@1@10@@danf@17-8-2009 10210470@unknown@formal@none@1@S@Current pattern recognition approaches do not seem to fully have the required high level of abstraction in order to be applied successfully.@@@@1@22@@danf@17-8-2009 10210480@unknown@formal@none@1@S@Instead, extensive experimentation with the tablebases, combined with an intensive study of tablebase-answers to well designed problems and with knowledge of prior art, i.e. pre-tablebase knowledge, is used to yield insightful patterns.@@@@1@32@@danf@17-8-2009 10210490@unknown@formal@none@1@S@[[Berlekamp]] in dots-and-boxes etc. and [[John Nunn]] in [[chess]] [[Chess endgame|endgames]] are notable examples of researchers doing this work, though they were not and are not involved in tablebase generation.@@@@1@30@@danf@17-8-2009 10210500@unknown@formal@none@1@S@===Business===@@@@1@1@@danf@17-8-2009 10210510@unknown@formal@none@1@S@Data mining in [[customer relationship management]] applications can contribute significantly to the bottom line.@@@@1@14@@danf@17-8-2009 10210520@unknown@formal@none@1@S@Rather than contacting a prospect or customer through a call center or sending mail, only prospects that are predicted to have a high likelihood of responding to an offer are contacted.@@@@1@31@@danf@17-8-2009 10210530@unknown@formal@none@1@S@More sophisticated methods may be used to optimize across campaigns so that we can predict which channel and which offer an individual is most likely to respond to - across all potential offers.@@@@1@33@@danf@17-8-2009 10210540@unknown@formal@none@1@S@Finally, in cases where many people will take an action without an offer, uplift modeling can be used to determine which people will have the greatest increase in responding if given an offer.@@@@1@33@@danf@17-8-2009 10210550@unknown@formal@none@1@S@[[Data clustering]] can also be used to automatically discover the segments or groups within a customer data set.@@@@1@18@@danf@17-8-2009 10210560@unknown@formal@none@1@S@Businesses employing data mining quickly see a return on investment, but also they recognize that the number of predictive models can quickly become very large.@@@@1@25@@danf@17-8-2009 10210570@unknown@formal@none@1@S@Rather than one model to predict which customers will [[Churning (stock trade)|churn]], a business could build a separate model for each region and customer type.@@@@1@25@@danf@17-8-2009 10210580@unknown@formal@none@1@S@Then instead of sending an offer to all people that are likely to churn, it may only want to send offers to customers that will likely take to offer.@@@@1@29@@danf@17-8-2009 10210590@unknown@formal@none@1@S@And finally, it may also want to determine which customers are going to be profitable over a window of time and only send the offers to those that are likely to be profitable.@@@@1@33@@danf@17-8-2009 10210600@unknown@formal@none@1@S@In order to maintain this quantity of models, they need to manage model versions and move to ''automated data mining''.@@@@1@20@@danf@17-8-2009 10210610@unknown@formal@none@1@S@Data mining can also be helpful to human-resources departments in identifying the characteristics of their most successful employees.@@@@1@18@@danf@17-8-2009 10210620@unknown@formal@none@1@S@Information obtained, such as universities attended by highly successful employees, can help HR focus recruiting efforts accordingly.@@@@1@17@@danf@17-8-2009 10210630@unknown@formal@none@1@S@Additionally, Strategic Enterprise Management applications help a company translate corporate-level goals, such as profit and margin share targets, into operational decisions, such as production plans and workforce levels.@@@@1@28@@danf@17-8-2009 10210640@unknown@formal@none@1@S@Another example of data mining, often called the [[market basket analysis]], relates to its use in retail sales.@@@@1@18@@danf@17-8-2009 10210650@unknown@formal@none@1@S@If a clothing store records the purchases of customers, a data-mining system could identify those customers who favour silk shirts over cotton ones.@@@@1@23@@danf@17-8-2009 10210660@unknown@formal@none@1@S@Although some explanations of relationships may be difficult, taking advantage of it is easier.@@@@1@14@@danf@17-8-2009 10210670@unknown@formal@none@1@S@The example deals with [[association rule]]s within transaction-based data.@@@@1@9@@danf@17-8-2009 10210680@unknown@formal@none@1@S@Not all data are transaction based and logical or inexact [[rule]]s may also be present within a [[database]].@@@@1@18@@danf@17-8-2009 10210690@unknown@formal@none@1@S@In a manufacturing application, an inexact rule may state that 73% of products which have a specific defect or problem will develop a secondary problem within the next six months.@@@@1@30@@danf@17-8-2009 10210700@unknown@formal@none@1@S@Related to an integrated-circuit production line, an example of data mining is described in the paper "Mining IC Test Data to Optimize VLSI Testing."@@@@1@24@@danf@17-8-2009 10210710@unknown@formal@none@1@S@In this paper the application of data mining and decision analysis to the problem of die-level functional test is described.@@@@1@20@@danf@17-8-2009 10210720@unknown@formal@none@1@S@Experiments mentioned in this paper demonstrate the ability of applying a system of mining historical die-test data to create a probabilistic model of patterns of die failure which are then utilized to decide in real time which die to test next and when to stop testing.@@@@1@46@@danf@17-8-2009 10210730@unknown@formal@none@1@S@This system has been shown, based on experiments with historical test data, to have the potential to improve profits on mature IC products.@@@@1@23@@danf@17-8-2009 10210740@unknown@formal@none@1@S@===Science and engineering===@@@@1@3@@danf@17-8-2009 10210750@unknown@formal@none@1@S@In recent years, data mining has been widely used in area of science and engineering, such as [[bioinformatic]]s, [[genetic]]s, [[medicine]], [[education]], and [[electrical power]] engineering.@@@@1@25@@danf@17-8-2009 10210760@unknown@formal@none@1@S@In the area of study on human genetics, the important goal is to understand the mapping relationship between the inter-individual variation in human [[DNA]] sequences and variability in disease susceptibility.@@@@1@30@@danf@17-8-2009 10210770@unknown@formal@none@1@S@In lay terms, it is to find out how the changes in an individual's DNA sequence affect the risk of developing common diseases such as [[cancer]].@@@@1@26@@danf@17-8-2009 10210780@unknown@formal@none@1@S@This is very important to help improve the diagnosis, prevention and treatment of the diseases.@@@@1@15@@danf@17-8-2009 10210790@unknown@formal@none@1@S@The data mining technique that is used to perform this task is known as [[multifactor dimensionality reduction]].@@@@1@17@@danf@17-8-2009 10210800@unknown@formal@none@1@S@In the area of electrical power engineering, data mining techniques have been widely used for [[condition monitoring]] of high voltage electrical equipment.@@@@1@22@@danf@17-8-2009 10210810@unknown@formal@none@1@S@The purpose of condition monitoring is to obtain valuable information on the [[insulation]]'s health status of the equipment.@@@@1@18@@danf@17-8-2009 10210820@unknown@formal@none@1@S@[[Data clustering]] such as [[self-organizing map]] (SOM) has been applied on the vibration monitoring and analysis of transformer on-load tap-changers(OLTCS).@@@@1@20@@danf@17-8-2009 10210830@unknown@formal@none@1@S@Using vibration monitoring, it can be observed that each tap change operation generates a signal that contains information about the condition of the tap changer contacts and the drive mechanisms.@@@@1@30@@danf@17-8-2009 10210840@unknown@formal@none@1@S@Obviously, different tap positions will generate different signals.@@@@1@8@@danf@17-8-2009 10210850@unknown@formal@none@1@S@However, there was considerable variability amongst normal condition signals for the exact same tap position.@@@@1@15@@danf@17-8-2009 10210860@unknown@formal@none@1@S@SOM has been applied to detect abnormal conditions and to estimate the nature of the abnormalities.@@@@1@16@@danf@17-8-2009 10210870@unknown@formal@none@1@S@Data mining techniques have also been applied for [[dissolved gas analysis]] (DGA) on [[power transformer]]s.@@@@1@15@@danf@17-8-2009 10210880@unknown@formal@none@1@S@DGA, as a diagnostics for power transformer, has been available for centuries.@@@@1@12@@danf@17-8-2009 10210890@unknown@formal@none@1@S@Data mining techniques such as SOM has been applied to analyse data and to determine trends which are not obvious to the standard DGA ratio techniques such as Duval Triangle.@@@@1@30@@danf@17-8-2009 10210900@unknown@formal@none@1@S@A fourth area of application for data mining in science/engineering is within educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning and to understand the factors influencing university student retention.@@@@1@45@@danf@17-8-2009 10210910@unknown@formal@none@1@S@Other examples of applying data mining technique applications are [[biomedical]] data facilitated by domain ontologies, mining clinical trial data, [[traffic analysis]] using SOM, et cetera.@@@@1@25@@danf@17-8-2009 10220010@unknown@formal@none@1@S@
Data set
@@@@1@2@@danf@17-8-2009 10220020@unknown@formal@none@1@S@A '''data set''' (or '''dataset''') is a collection of [[data]], usually presented in tabular form.@@@@1@15@@danf@17-8-2009 10220030@unknown@formal@none@1@S@Each column represents a particular variable.@@@@1@6@@danf@17-8-2009 10220040@unknown@formal@none@1@S@Each row corresponds to a given member of the data set in question.@@@@1@13@@danf@17-8-2009 10220050@unknown@formal@none@1@S@It lists values for each of the variables, such as height and weight of an object or values of random numbers.@@@@1@21@@danf@17-8-2009 10220060@unknown@formal@none@1@S@Each value is known as a [[datum]].@@@@1@7@@danf@17-8-2009 10220070@unknown@formal@none@1@S@The data set may comprise data for one or more members, corresponding to the number of rows.@@@@1@17@@danf@17-8-2009 10220080@unknown@formal@none@1@S@Historically, the term originated in the [[mainframe computer|mainframe field]], where it had a [[Data set (IBM mainframe)|well-defined meaning]], very close to contemporary ''[[computer file]]''.@@@@1@24@@danf@17-8-2009 10220090@unknown@formal@none@1@S@This topic is not covered here.@@@@1@6@@danf@17-8-2009 10220100@unknown@formal@none@1@S@In the simplest case, there is only one variable, and then the data set consists of a single column of values, often represented as a list.@@@@1@26@@danf@17-8-2009 10220110@unknown@formal@none@1@S@The values may be numbers, such as [[real number]]s or [[integer]]s, for example representing a person's height in centimeters, but may also be [[nominal data]] (i.e., not consisting of [[numerical]] values), for example representing a person's ethnicity.@@@@1@37@@danf@17-8-2009 10220120@unknown@formal@none@1@S@More generally, values may be of any of the kinds described as a [[level of measurement]].@@@@1@16@@danf@17-8-2009 10220130@unknown@formal@none@1@S@For each variable, the values will normally all be of the same kind.@@@@1@13@@danf@17-8-2009 10220140@unknown@formal@none@1@S@However, there may also be "[[missing values]]", which need to be indicated in some way.@@@@1@15@@danf@17-8-2009 10220150@unknown@formal@none@1@S@In [[statistics]] data sets usually come from actual observations obtained by [[sampling (statistics)|sampling]] a [[statistical population]], and each row corresponds to the observations on one element of that population.@@@@1@29@@danf@17-8-2009 10220160@unknown@formal@none@1@S@Data sets may further be generated by [[algorithms]] for the purpose of testing certain kinds of [[software]].@@@@1@17@@danf@17-8-2009 10220170@unknown@formal@none@1@S@Some modern statistical analysis software such as [[PSPP]] still present their data in the classical dataset fashion.@@@@1@17@@danf@17-8-2009 10220180@unknown@formal@none@1@S@== Classic data sets ==@@@@1@5@@danf@17-8-2009 10220190@unknown@formal@none@1@S@Several classic [[data set]]s have been used extensively in the [[statistical]] literature:@@@@1@12@@danf@17-8-2009 10220200@unknown@formal@none@1@S@* [[Iris flower data set]] - multivariate data set introduced by [[Ronald Fisher]] (1936).@@@@1@14@@danf@17-8-2009 10220210@unknown@formal@none@1@S@* ''[[Categorical data analysis]]'' - Data sets used in the book, ''An Introduction to Categorical Data Analysis'', by Agresti are [http://lib.stat.cmu.edu/datasets/agresti provided on-line by StatLib.]@@@@1@25@@danf@17-8-2009 10220220@unknown@formal@none@1@S@*''[[Robust statistics]]'' - Data sets used in ''Robust Regression and Outlier Detection'' (Rousseeuw and Leroy, 1986). [http://www.uni-koeln.de/themen/Statistik/data/rousseeuw/ Provided on-line at the University of Cologne.]@@@@1@24@@danf@17-8-2009 10220230@unknown@formal@none@1@S@*''[[Time series]]'' - Data used in Chatfield's book, ''The Analysis of Time Series'', are [http://lib.stat.cmu.edu/modules.php?op=modload&name=PostWrap&file=index&page=datasets/ provided on-line by StatLib.]@@@@1@19@@danf@17-8-2009 10220240@unknown@formal@none@1@S@*''Extreme values'' - Data used in the book, ''An Introduction to the Statistical Modeling of Extreme Values'' are [http://homes.stat.unipd.it/coles/public_html/ismev/ismev.dat provided on-line by Stuart Coles], the book's author.@@@@1@27@@danf@17-8-2009 10220250@unknown@formal@none@1@S@*''Bayesian Data Analysis'' - Data used in the book, ''[[Bayesian]] Data Analysis'', are [http://www.stat.columbia.edu/~gelman/book/data/ provided on-line by Andrew Gelman], one of the book's authors.@@@@1@24@@danf@17-8-2009 10220260@unknown@formal@none@1@S@* The [ftp://ftp.ics.uci.edu/pub/machine-learning-databases/liver-disorders Bupa liver data], used in several papers in the machine learning (data mining) literature.@@@@1@17@@danf@17-8-2009 10230010@unknown@formal@none@1@S@
ELIZA
@@@@1@1@@danf@17-8-2009 10230020@unknown@formal@none@1@S@'''ELIZA''' is a [[computer program]] by [[Joseph Weizenbaum]], designed in [[1966]], which parodied a [[Rogerian psychotherapy|Rogerian therapist]], largely by rephrasing many of the patient's statements as questions and posing them to the patient.@@@@1@33@@danf@17-8-2009 10230030@unknown@formal@none@1@S@Thus, for example, the response to "My head hurts" might be "Why do you say your head hurts?"@@@@1@18@@danf@17-8-2009 10230040@unknown@formal@none@1@S@The response to "My mother hates me" might be "Who else in your family hates you?"@@@@1@16@@danf@17-8-2009 10230050@unknown@formal@none@1@S@ELIZA was named after Eliza Doolittle, a working-class character in [[George Bernard Shaw|George Bernard Shaw's]] play ''[[Pygmalion (play)|Pygmalion]]'', who is taught to speak with an [[upper class]] [[accent (linguistics)|accent]].@@@@1@29@@danf@17-8-2009 10230060@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10230070@unknown@formal@none@1@S@It is sometimes inaccurately said that ELIZA simulates a therapist.@@@@1@10@@danf@17-8-2009 10230080@unknown@formal@none@1@S@Weizenbaum said that ELIZA provided a "[[parody]]" of "the responses of a non-directional psychotherapist in an initial psychiatric interview."@@@@1@19@@danf@17-8-2009 10230090@unknown@formal@none@1@S@He chose the context of psychotherapy to "sidestep the problem of giving the program a data base of real-world knowledge", the therapeutic situation being one of the few real human situations in which a human being can reply to a statement with a question that indicates very little specific knowledge of the topic under discussion.@@@@1@55@@danf@17-8-2009 10230100@unknown@formal@none@1@S@For example, it is a context in which the question "Who is your favorite composer?" can be answered acceptably with responses such as "What about your own favorite composer?" or "Does that question interest you?"@@@@1@35@@danf@17-8-2009 10230110@unknown@formal@none@1@S@First implemented in Weizenbaum's own [[SLIP (programming language)|SLIP]] list-processing language, ELIZA worked by simple [[parsing]] and substitution of key words into canned phrases.@@@@1@23@@danf@17-8-2009 10230120@unknown@formal@none@1@S@Depending upon the initial entries by the user the illusion of a human writer could be instantly dispelled, or could continue through several interchanges.@@@@1@24@@danf@17-8-2009 10230130@unknown@formal@none@1@S@It was sometimes so convincing that there are many anecdotes about people becoming very emotionally caught up in dealing with ELIZA for several minutes until the machine's true lack of understanding became apparent.@@@@1@33@@danf@17-8-2009 10230140@unknown@formal@none@1@S@This was likely due to people's tendency to attach meanings to words which the computer never put there.@@@@1@18@@danf@17-8-2009 10230150@unknown@formal@none@1@S@In 1966, interactive computing (via a teletype) was new.@@@@1@9@@danf@17-8-2009 10230160@unknown@formal@none@1@S@It was 15 years before the personal computer became familiar to the general public, and two decades before most people encountered attempts at [[natural language processing]] in Internet services like [[Ask.com]] or PC help systems such as Microsoft Office [[Office Assistant|Clippy]].@@@@1@41@@danf@17-8-2009 10230170@unknown@formal@none@1@S@Although those programs included years of research and work (while ''[[Ecala]]'' eclipsed the functionality of ''ELIZA'' after less than two weeks of work by a single programmer), ''ELIZA'' remains a milestone simply because it was the first time a programmer had attempted such a human-machine interaction with the goal of creating the illusion (however brief) of human-''human'' interaction.@@@@1@58@@danf@17-8-2009 10230180@unknown@formal@none@1@S@In the article "theNewMediaReader" an excerpt from "From Computer Power and Human Reason" by Joseph Weizenbaum in 1976, edited by Noah Wardrip-Fruin and Nick Montfort he references how quickly and deeply people became emotionally involved with the computer program, taking offence when he asked to view the transcripts, saying it was an invasion of their privacy, even asking him to leave the room while they were working with ELIZA.@@@@1@69@@danf@17-8-2009 10230190@unknown@formal@none@1@S@==Influence on games==@@@@1@3@@danf@17-8-2009 10230200@unknown@formal@none@1@S@ELIZA impacted a number of early [[computer games]] by demonstrating additional kinds of [[interface design]]s.@@@@1@15@@danf@17-8-2009 10230210@unknown@formal@none@1@S@[[Don Daglow]] wrote an enhanced version of the program called ''Ecala'' on a [[PDP-10]] [[mainframe computer]] at [[Pomona College]] in [[1973]] before writing what was possibly the second or third computer [[role-playing game]], ''[[Dungeon (computer game)|Dungeon]]'' ([[1975]]) (The first was probably "[[dnd (computer game)|dnd]]", written on and for the PLATO system in 1974, and the second may have been [[Moria]], written in 1975).@@@@1@63@@danf@17-8-2009 10230220@unknown@formal@none@1@S@It is likely that ''ELIZA'' was also on the system where [[Will Crowther]] created ''[[Colossal Cave Adventure|Adventure]]'', the 1975 game that spawned the [[interactive fiction]] genre.@@@@1@26@@danf@17-8-2009 10230230@unknown@formal@none@1@S@But both these games appeared some nine years after the original ''ELIZA''.@@@@1@12@@danf@17-8-2009 10230240@unknown@formal@none@1@S@==Response and legacy==@@@@1@3@@danf@17-8-2009 10230250@unknown@formal@none@1@S@Lay responses to ELIZA were disturbing to Weizenbaum and motivated him to write his book ''Computer Power and Human Reason: From Judgment to Calculation'', in which he explains the limits of computers, as he wants to make clear in people's minds his opinion that the anthropomorphic views of computers are just a reduction of the human being and any life form for that matter.@@@@1@64@@danf@17-8-2009 10230260@unknown@formal@none@1@S@There are many programs based on ELIZA in different languages in addition to ''Ecala''.@@@@1@14@@danf@17-8-2009 10230270@unknown@formal@none@1@S@For example, in 1980, a company called "Don't Ask Software", founded by Randy Simon, created a version for the Apple II, Atari, and Commodore PCs, which verbally abused the user based on the user's input.@@@@1@35@@danf@17-8-2009 10230280@unknown@formal@none@1@S@In Spain, Jordi Perez developed the famous ZEBAL in 1993, written in [[Clipper programming language|Clipper]] for MS-DOS.@@@@1@17@@danf@17-8-2009 10230290@unknown@formal@none@1@S@Other versions adapted ELIZA around a religious theme, such as ones featuring Jesus (both serious and comedic) and another Apple II variant called ''I Am Buddha''.@@@@1@26@@danf@17-8-2009 10230300@unknown@formal@none@1@S@The 1980 game ''[[The Prisoner (computer game)|The Prisoner]]'' incorporated ELIZA-style interaction within its gameplay.@@@@1@14@@danf@17-8-2009 10230310@unknown@formal@none@1@S@ELIZA has also inspired a [[podcast]] called "The Eliza Podcast", in which the host engages in self-analysis using a computer generated voice prompting with questions in the same style as the ELIZA program.@@@@1@33@@danf@17-8-2009 10230320@unknown@formal@none@1@S@==Implementations==@@@@1@1@@danf@17-8-2009 10230330@unknown@formal@none@1@S@* Using [[JavaScript]]: http://www.manifestation.com/neurotoys/eliza.php3@@@@1@4@@danf@17-8-2009 10230340@unknown@formal@none@1@S@* Source code in [[Java (programming language)|Java]]: http://chayden.net/eliza/Eliza.html@@@@1@8@@danf@17-8-2009 10230350@unknown@formal@none@1@S@* Another [[Java (programming language)|Java]]-implementation of ELIZA: http://www.wedesoft.demon.co.uk/eliza/@@@@1@8@@danf@17-8-2009 10230360@unknown@formal@none@1@S@* Using [[C (programming language)|C]] on the [[TI-89]]: http://kaikostack.com/ti89_en.htm#eliza@@@@1@9@@danf@17-8-2009 10230370@unknown@formal@none@1@S@* Using [[z80#The Z80 assembly language|z80 Assembly]] on the [[TI-83#TI-83 Plus|TI-83 Plus]]: http://www.ticalc.org/archives/files/fileinfo/354/35463.html@@@@1@13@@danf@17-8-2009 10230380@unknown@formal@none@1@S@* A [[perl module]] [http://search.cpan.org/dist/Chatbot-Eliza/ Chatbot::Eliza] — [http://www.terrence.com/perl/eliza/eliza.cgi example implementation]@@@@1@10@@danf@17-8-2009 10230390@unknown@formal@none@1@S@* Trans-Tex Software has released shareware versions for Classic Mac OS and Mac OS X: http://www.tex-edit.com/index.html#Eliza@@@@1@16@@danf@17-8-2009 10230400@unknown@formal@none@1@S@* doctor.el (circa [[1985]]) in [[Emacs]].@@@@1@6@@danf@17-8-2009 10230410@unknown@formal@none@1@S@* Source code in [[Tcl]]: [http://wiki.tcl.tk/9235 http://wiki.tcl.tk/9235]@@@@1@7@@danf@17-8-2009 10230420@unknown@formal@none@1@S@* The [http://www.indyproject.org Indy] [[Delphi]] oriented TCP/IP components suite has an Eliza implementation as demo.@@@@1@15@@danf@17-8-2009 10230430@unknown@formal@none@1@S@*[http://www.cs.bham.ac.uk/research/projects/cogaff/eliza Pop-11 Eliza] in the [[poplog]] system.@@@@1@7@@danf@17-8-2009 10230440@unknown@formal@none@1@S@Goes back to about 1976, when it was used for teaching AI at [[Sussex University]].@@@@1@15@@danf@17-8-2009 10230450@unknown@formal@none@1@S@Now part of the free open source Poplog system.@@@@1@9@@danf@17-8-2009 10230460@unknown@formal@none@1@S@* Source code in [[BASIC]]: http://www.atariarchives.org/bigcomputergames/showpage.php?page=22@@@@1@6@@danf@17-8-2009 10230470@unknown@formal@none@1@S@* ECC-Eliza for Windows (actual program is for DOS, but unpacker is for Windows) (rename .txt to .exe before running): http://www5.domaindlx.com/ecceliza1/ecceliza.txt.@@@@1@21@@danf@17-8-2009 10230480@unknown@formal@none@1@S@More recent version at http://web.archive.org/web/20041117123025/http://www5.domaindlx.com/ecceliza1/ecceliza.txt.@@@@1@5@@danf@17-8-2009 10240010@unknown@formal@none@1@S@
English language
@@@@1@2@@danf@17-8-2009 10240020@unknown@formal@none@1@S@'''English''' is an [[Indo-European languages|Indo-European]], [[West Germanic languages|West Germanic language]] originating in [[England]], and is the [[first language]] for most people in the [[United Kingdom]], the [[United States]], [[Canada]], [[Australia]], [[New Zealand]], [[Republic of Ireland|Ireland]], and the [[Anglophone Caribbean]].@@@@1@39@@danf@17-8-2009 10240030@unknown@formal@none@1@S@It is used extensively as a [[second language]] and as an [[official language]] throughout the world, especially in [[Commonwealth of Nations|Commonwealth]] countries and in many [[international organization]]s.@@@@1@27@@danf@17-8-2009 10240040@unknown@formal@none@1@S@==Significance==@@@@1@1@@danf@17-8-2009 10240050@unknown@formal@none@1@S@Modern English, sometimes described as the first global [[lingua franca]], is the [[Linguistic imperialism|dominant]] [[international auxiliary language|international language]] in [[communication]]s, [[science]], [[business]], [[aviation]], [[entertainment]], [[radio]] and [[diplomacy]].@@@@1@27@@danf@17-8-2009 10240060@unknown@formal@none@1@S@The initial reason for its enormous spread beyond the bounds of the [[British Isles]] where it was originally a native tongue was the [[British Empire]], and by the late nineteenth century its influence had won a truly global reach.@@@@1@39@@danf@17-8-2009 10240070@unknown@formal@none@1@S@It is the dominant language in the [[United States]] and the growing economic and cultural influence of that [[federal union]] as a global [[superpower]] since [[World War II]] has significantly accelerated adoption of English as a language across the planet.@@@@1@40@@danf@17-8-2009 10240080@unknown@formal@none@1@S@A working knowledge of English has become a requirement in a number of fields, occupations and professions such as medicine and as a consequence over a billion people speak English to at least a basic level (see [[English language learning and teaching]]).@@@@1@42@@danf@17-8-2009 10240090@unknown@formal@none@1@S@Linguists such as [[David Crystal]] recognize that one impact of this massive growth of English, in common with other global languages, has been to reduce native [[Natural language#Linguistic diversity|linguistic diversity]] in many parts of the world historically, most particularly in [[Australasia]] and [[North America]], and its huge influence continues to play an important role in [[language attrition]].@@@@1@57@@danf@17-8-2009 10240100@unknown@formal@none@1@S@By a similar token, [[historical linguistics|historical linguists]], aware of the complex and fluid dynamics of [[language change]], are always alive to the potential English contains through the vast size and spread of the communities that use it and its natural internal variety, such as in its [[English-based creole languages|creoles]] and [[pidgin]]s, to produce a new [[language family|family]] of distinct languages over time.@@@@1@62@@danf@17-8-2009 10240110@unknown@formal@none@1@S@English is one of six official languages of the [[United Nations]].@@@@1@11@@danf@17-8-2009 10240120@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10240130@unknown@formal@none@1@S@English is a [[West Germanic languages|West Germanic]] language that originated from the [[Anglo-Frisian languages|Anglo-Frisian]] dialects brought to [[Great Britain|Britain]] by Germanic settlers and Roman auxiliary troops from various parts of what is now northwest Germany and the Northern [[Netherlands]].@@@@1@39@@danf@17-8-2009 10240140@unknown@formal@none@1@S@Initially, [[Old English language|Old English]] was a diverse group of dialects, reflecting the varied origins of the Anglo-Saxon Kingdoms of [[England]].@@@@1@21@@danf@17-8-2009 10240150@unknown@formal@none@1@S@One of these dialects, Late West Saxon, eventually came to dominate.@@@@1@11@@danf@17-8-2009 10240160@unknown@formal@none@1@S@The original Old English language was then influenced by two waves of invasion.@@@@1@13@@danf@17-8-2009 10240170@unknown@formal@none@1@S@The first was by language speakers of the [[North Germanic languages|Scandinavian]] branch of the Germanic family; they conquered and colonized parts of Britain in the 8th and 9th centuries.@@@@1@29@@danf@17-8-2009 10240180@unknown@formal@none@1@S@The second was the [[Normans]] in the 11th century, who spoke Old Norman and ultimately developed an English variety of this called [[Anglo-Norman]].@@@@1@23@@danf@17-8-2009 10240190@unknown@formal@none@1@S@These two invasions caused English to become "mixed" to some degree (though it was never a truly mixed language in the strict linguistic sense of the word; mixed languages arise from the cohabitation of speakers of different languages, who develop a hybrid tongue for basic communication).@@@@1@46@@danf@17-8-2009 10240200@unknown@formal@none@1@S@Cohabitation with the Scandinavians resulted in a significant grammatical simplification and lexical supplementation of the Anglo-Frisian core of English; the later [[Normans|Norman]] occupation led to the grafting onto that Germanic core of a more elaborate layer of words from the [[Italic languages|Italic]] branch of the European languages.@@@@1@47@@danf@17-8-2009 10240210@unknown@formal@none@1@S@This Norman influence entered English largely through the courts and government.@@@@1@11@@danf@17-8-2009 10240220@unknown@formal@none@1@S@Thus, English developed into a "borrowing" language of great flexibility and with a huge vocabulary.@@@@1@15@@danf@17-8-2009 10240230@unknown@formal@none@1@S@== Classification and related languages ==@@@@1@6@@danf@17-8-2009 10240240@unknown@formal@none@1@S@The English language belongs to the western sub-branch of the [[Germanic languages|Germanic branch]] of the [[Indo-European languages|Indo-European]] family of languages.@@@@1@20@@danf@17-8-2009 10240250@unknown@formal@none@1@S@The closest living relative of English is [[Scots language|Scots]], spoken primarily in Scotland and parts of Northern Ireland, which is viewed by linguists as either a separate language or a group of dialects of English.@@@@1@35@@danf@17-8-2009 10240260@unknown@formal@none@1@S@The next closest relative to English after Scots is [[Frisian languages|Frisian]], spoken in the Northern Netherlands and Northwest Germany.@@@@1@19@@danf@17-8-2009 10240270@unknown@formal@none@1@S@Other less closely related living [[West Germanic languages]] include [[Dutch language|Dutch]], [[Low German]], [[German language|German]] and [[Afrikaans]].@@@@1@17@@danf@17-8-2009 10240280@unknown@formal@none@1@S@The [[North Germanic languages]] of Scandinavia are less closely related to English than the West Germanic languages.@@@@1@17@@danf@17-8-2009 10240290@unknown@formal@none@1@S@Many [[French language|French]] words are also intelligible to an English speaker (though pronunciations are often quite different) because English absorbed a large vocabulary from [[Norman language|Norman]] and French, via [[Anglo-Norman]] after the Norman Conquest and directly from French in subsequent centuries.@@@@1@41@@danf@17-8-2009 10240300@unknown@formal@none@1@S@As a result, a large portion of English vocabulary is derived from French, with some minor spelling differences (word endings, use of old French spellings, etc.), as well as occasional divergences in meaning, in so-called "faux amis", or [[false friend]]s.@@@@1@40@@danf@17-8-2009 10240310@unknown@formal@none@1@S@The pronunciation of French loanwords in English has become completely anglicized and follows a typically Germanic pattern of stress.@@@@1@19@@danf@17-8-2009 10240320@unknown@formal@none@1@S@== Geographical distribution ==@@@@1@4@@danf@17-8-2009 10240330@unknown@formal@none@1@S@Approximately 375 million people speak English as their first language.@@@@1@10@@danf@17-8-2009 10240340@unknown@formal@none@1@S@English today is probably the third largest language by number of native speakers, after [[Mandarin (linguistics)|Mandarin Chinese]] and [[Spanish language|Spanish]].@@@@1@20@@danf@17-8-2009 10240350@unknown@formal@none@1@S@However, when combining native and non-native speakers it is probably the most commonly spoken language in the world, though possibly second to a combination of the [[Chinese language]]s, depending on whether or not distinctions in the latter are classified as "languages" or "dialects."@@@@1@43@@danf@17-8-2009 10240360@unknown@formal@none@1@S@Estimates that include [[second language]] speakers vary greatly from 470 million to over a billion depending on how [[literacy]] or mastery is defined.@@@@1@23@@danf@17-8-2009 10240370@unknown@formal@none@1@S@There are some who claim that non-native speakers now outnumber native speakers by a ratio of 3 to 1.@@@@1@19@@danf@17-8-2009 10240380@unknown@formal@none@1@S@The countries with the highest populations of native English speakers are, in descending order: United States (215 million), United Kingdom (58 million), Canada (18.2 million), Australia (15.5 million), [[Republic of Ireland|Ireland]] (3.8 million), South Africa (3.7 million), and New Zealand (3.0-3.7 million).@@@@1@42@@danf@17-8-2009 10240390@unknown@formal@none@1@S@Countries such as [[Jamaica]] and [[Nigeria]] also have millions of native speakers of [[dialect continuum|dialect continua]] ranging from an [[English-based creole languages|English-based creole]] to a more standard version of English.@@@@1@30@@danf@17-8-2009 10240400@unknown@formal@none@1@S@Of those nations where English is spoken as a second language, India has the most such speakers ('[[Indian English]]') and linguistics professor [[David Crystal]] claims that, combining native and non-native speakers, India now has more people who speak or understand English than any other country in the world.@@@@1@48@@danf@17-8-2009 10240410@unknown@formal@none@1@S@Following India is the [[People's Republic of China]].@@@@1@8@@danf@17-8-2009 10240420@unknown@formal@none@1@S@===Countries in order of total speakers===@@@@1@6@@danf@17-8-2009 10240430@unknown@formal@none@1@S@English is the primary language in [[Anguilla]], [[Antigua and Barbuda]], Australia ([[Australian English]]), the [[The Bahamas|Bahamas]], [[Barbados]], [[Bermuda]], [[Belize]] ([[Belizean Kriol language|Belizean Kriol]]), the [[British Indian Ocean Territory]], the [[British Virgin Islands]], Canada ([[Canadian English]]), the [[Cayman Islands]], the [[Falkland Islands]], [[Gibraltar]], [[Grenada]], [[Guam]], [[Guernsey]] ([[Channel Island English]]), [[Guyana]], Ireland ([[Hiberno-English]]), [[Isle of Man]] ([[Manx English]]), Jamaica ([[Jamaican English]]), [[Jersey]], [[Montserrat]], [[Nauru]], New Zealand ([[New Zealand English]]), [[Pitcairn Islands]], [[Saint Helena]], [[Saint Kitts and Nevis]], [[Saint Vincent and the Grenadines]], [[Singapore]], [[South Georgia and the South Sandwich Islands]], [[Trinidad and Tobago]], the [[Turks and Caicos Islands]], the United Kingdom, the [[United States Virgin Islands|U.S. Virgin Islands]], and the United States.@@@@1@110@@danf@17-8-2009 10240440@unknown@formal@none@1@S@In many other countries, where English is not the most spoken language, it is an official language; these countries include [[Botswana]], [[Cameroon]], [[Dominica]], [[Fiji]], the [[Federated States of Micronesia]], [[Ghana]], [[The Gambia|Gambia]], [[India]], [[Kenya]], [[Kiribati]], [[Lesotho]], [[Liberia]], [[Madagascar]], [[Malta]], the [[Marshall Islands]], [[Mauritius]], [[Namibia]], [[Nigeria]], [[Pakistan]], [[Palau]], [[Papua New Guinea]], the [[Philippines]], [[Puerto Rico]], [[Rwanda]], the [[Solomon Islands]], [[Saint Lucia]], [[Samoa]], [[Seychelles]], [[Sierra Leone]], [[Sri Lanka]], [[Swaziland]], [[Tanzania]], [[Uganda]], [[Zambia]], and [[Zimbabwe]].@@@@1@72@@danf@17-8-2009 10240450@unknown@formal@none@1@S@It is also one of the 11 official languages that are given equal status in South Africa ([[South African English]]).@@@@1@20@@danf@17-8-2009 10240460@unknown@formal@none@1@S@English is also the official language in current [[dependent territory|dependent territories]] of Australia ([[Norfolk Island]], [[Christmas Island]] and [[Cocos Island]]) and of the United States ([[Northern Mariana Islands]], [[American Samoa]] and [[Puerto Rico]]), and in the former British colony of [[Hong Kong]].@@@@1@42@@danf@17-8-2009 10240470@unknown@formal@none@1@S@English is an important language in several former [[colony|colonies]] and [[protectorate]]s of the United Kingdom but falls short of official status, such as in [[Malaysia]], [[Brunei]], [[United Arab Emirates]] and [[Bahrain]].@@@@1@31@@danf@17-8-2009 10240480@unknown@formal@none@1@S@English is also not an official language in either the United States or the United Kingdom.@@@@1@16@@danf@17-8-2009 10240490@unknown@formal@none@1@S@Although the United States federal government has no official languages, English has been given official status by 30 of the 50 state governments.@@@@1@23@@danf@17-8-2009 10240500@unknown@formal@none@1@S@===English as a global language===@@@@1@5@@danf@17-8-2009 10240510@unknown@formal@none@1@S@Because English is so widely spoken, it has often been referred to as a "[[world language]]", the ''[[lingua franca]]'' of the modern era.@@@@1@23@@danf@17-8-2009 10240520@unknown@formal@none@1@S@While English is not an official language in most countries, it is currently the language most often taught as a [[second language]] around the world.@@@@1@25@@danf@17-8-2009 10240530@unknown@formal@none@1@S@Some linguists believe that it is no longer the exclusive cultural sign of "native English speakers", but is rather a language that is absorbing aspects of cultures worldwide as it continues to grow.@@@@1@33@@danf@17-8-2009 10240540@unknown@formal@none@1@S@It is, by international treaty, the official language for aerial and maritime communications.@@@@1@13@@danf@17-8-2009 10240550@unknown@formal@none@1@S@English is an official language of the [[United Nations]] and many other international organizations, including the [[International Olympic Committee]].@@@@1@19@@danf@17-8-2009 10240560@unknown@formal@none@1@S@English is the language most often studied as a foreign language in the European Union (by 89% of schoolchildren), followed by French (32%), German (18%), and Spanish (8%).@@@@1@28@@danf@17-8-2009 10240570@unknown@formal@none@1@S@In the EU, a large fraction of the population reports being able to converse to some extent in English.@@@@1@19@@danf@17-8-2009 10240580@unknown@formal@none@1@S@Among non-English speaking countries, a large percentage of the population claimed to be able to converse in English in the [[Netherlands]] (87%), [[Sweden]] (85%), [[Denmark]] (83%), [[Luxembourg]] (66%), [[Finland]] (60%), [[Slovenia]] (56%), [[Austria]] (53%), [[Belgium]] (52%), and [[Germany]] (51%).@@@@1@39@@danf@17-8-2009 10240590@unknown@formal@none@1@S@[[Norway]] and [[Iceland]] also have a large majority of competent English-speakers.@@@@1@11@@danf@17-8-2009 10240600@unknown@formal@none@1@S@[[Book]]s, [[magazine]]s, and [[newspaper]]s written in English are available in many countries around the world.@@@@1@15@@danf@17-8-2009 10240610@unknown@formal@none@1@S@English is also the most commonly used language in the [[science]]s.@@@@1@11@@danf@17-8-2009 10240620@unknown@formal@none@1@S@In 1997, the [[Science Citation Index]] reported that 95% of its articles were written in English, even though only half of them came from authors in English-speaking countries.@@@@1@28@@danf@17-8-2009 10240630@unknown@formal@none@1@S@=== Dialects and regional varieties ===@@@@1@6@@danf@17-8-2009 10240640@unknown@formal@none@1@S@The expansion of the British Empire and—since WWII—the primacy of the United States have spread English throughout the globe.@@@@1@19@@danf@17-8-2009 10240650@unknown@formal@none@1@S@Because of that global spread, English has developed a host of [[List of dialects of the English language|English dialects]] and English-based [[creole language]]s and [[pidgin]]s.@@@@1@25@@danf@17-8-2009 10240660@unknown@formal@none@1@S@The major [[Variety (linguistics)|varieties]] of English include, in most cases, several subvarieties, such as [[Cockney]] within [[British English]]; [[Newfoundland English]] within [[Canadian English]]; and [[African American Vernacular English]] ("Ebonics") and [[Southern American English]] within [[American English]].@@@@1@36@@danf@17-8-2009 10240670@unknown@formal@none@1@S@English is a [[pluricentric language]], without a central language authority like France's [[Académie française]]; and, although no variety is clearly considered the only standard, there are a number of accents considered to be more prestigious, such as [[Received Pronunciation]] in Britain.@@@@1@41@@danf@17-8-2009 10240680@unknown@formal@none@1@S@[[Scots language|Scots]] developed—largely independently—from the same origins, but following the [[Acts of Union 1707]] a process of [[language attrition]] began, whereby successive generations adopted more and more features from English causing dialectalisation.@@@@1@32@@danf@17-8-2009 10240690@unknown@formal@none@1@S@Whether it is now a separate language or a [[dialect]] of English better described as [[Scottish English]] is in dispute.@@@@1@20@@danf@17-8-2009 10240700@unknown@formal@none@1@S@The pronunciation, grammar and lexis of the traditional forms differ, sometimes substantially, from other varieties of English.@@@@1@17@@danf@17-8-2009 10240710@unknown@formal@none@1@S@Because of the wide use of English as a second language, English speakers have many different [[Accent (linguistics)|accents]], which often signal the speaker's native dialect or language.@@@@1@27@@danf@17-8-2009 10240720@unknown@formal@none@1@S@For the more distinctive characteristics of regional accents, see [[Regional accents of English]], and for the more distinctive characteristics of regional dialects, see [[List of dialects of the English language]].@@@@1@30@@danf@17-8-2009 10240730@unknown@formal@none@1@S@Just as English itself has borrowed words from many different languages over its history, English [[loanword]]s now appear in a great many languages around the world, indicative of the technological and cultural influence of its speakers.@@@@1@36@@danf@17-8-2009 10240740@unknown@formal@none@1@S@Several [[pidgin]]s and [[creole language]]s have formed using an English base, such as [[Jamaican (language)|Jamaican Patois]], [[Nigerian Pidgin]], and [[Tok Pisin]].@@@@1@21@@danf@17-8-2009 10240750@unknown@formal@none@1@S@There are many words in English coined to describe forms of particular non-English languages that contain a very high proportion of English words.@@@@1@23@@danf@17-8-2009 10240760@unknown@formal@none@1@S@[[Franglais]], for example, is used to describe French with a very high English word content; it is found on the [[Channel Islands]].@@@@1@22@@danf@17-8-2009 10240770@unknown@formal@none@1@S@Another variant, spoken in the border bilingual regions of Québec in Canada, is called [[Franglais#Frenglish|Frenglish]].@@@@1@15@@danf@17-8-2009 10240780@unknown@formal@none@1@S@In [[Wales]], which is part of the United Kingdom, the languages of [[Welsh language|Welsh]] and English are sometimes mixed together by fluent or comfortable Welsh speakers, the result of which is called [[Welsh English|Wenglish]].@@@@1@34@@danf@17-8-2009 10240790@unknown@formal@none@1@S@=== Constructed varieties of English ===@@@@1@6@@danf@17-8-2009 10240800@unknown@formal@none@1@S@* [[Basic English]] is simplified for easy international use.@@@@1@9@@danf@17-8-2009 10240810@unknown@formal@none@1@S@It is used by manufacturers and other international businesses to write manuals and communicate.@@@@1@14@@danf@17-8-2009 10240820@unknown@formal@none@1@S@Some English schools in Asia teach it as a practical subset of English for use by beginners.@@@@1@17@@danf@17-8-2009 10240830@unknown@formal@none@1@S@* [[Special English]] is a simplified version of English used by the [[Voice of America]].@@@@1@15@@danf@17-8-2009 10240840@unknown@formal@none@1@S@It uses a vocabulary of only 1500 words.@@@@1@8@@danf@17-8-2009 10240850@unknown@formal@none@1@S@* [[English spelling reform|English reform]] is an attempt to improve collectively upon the English language.@@@@1@15@@danf@17-8-2009 10240860@unknown@formal@none@1@S@* [[Seaspeak]] and the related [[NATO phonetic alphabet|Airspeak]] and Policespeak, all based on restricted vocabularies, were designed by [[Edward Johnson]] in the 1980s to aid international cooperation and communication in specific areas.@@@@1@32@@danf@17-8-2009 10240870@unknown@formal@none@1@S@There is also a [[tunnelspeak]] for use in the [[Channel Tunnel]].@@@@1@11@@danf@17-8-2009 10240880@unknown@formal@none@1@S@* [[Euro-English]] is a concept of standardising English for use as a second language in continental Europe.@@@@1@17@@danf@17-8-2009 10240890@unknown@formal@none@1@S@* [[Manually Coded English]] — a variety of systems have been developed to represent the English language with hand signals, designed primarily for use in deaf education.@@@@1@27@@danf@17-8-2009 10240900@unknown@formal@none@1@S@These should not be confused with true sign languages such as [[British Sign Language]] and [[American Sign Language]] used in Anglophone countries, which are independent and not based on English.@@@@1@30@@danf@17-8-2009 10240910@unknown@formal@none@1@S@* [[E-Prime]] excludes forms of the verb ''to be''.@@@@1@9@@danf@17-8-2009 10240920@unknown@formal@none@1@S@Euro-English (also ''EuroEnglish'' or ''Euro-English'') terms are English translations of European concepts that are not native to English-speaking countries.@@@@1@19@@danf@17-8-2009 10240930@unknown@formal@none@1@S@Because of the United Kingdom's (and even the Republic of Ireland's) involvement in the European Union, the usage focuses on non-British concepts.@@@@1@22@@danf@17-8-2009 10240940@unknown@formal@none@1@S@This kind of Euro-English was parodied when English was "made" one of the constituent languages of [[Europanto]].@@@@1@17@@danf@17-8-2009 10240950@unknown@formal@none@1@S@== Phonology ==@@@@1@3@@danf@17-8-2009 10240960@unknown@formal@none@1@S@=== Vowels ===@@@@1@3@@danf@17-8-2009 10240970@unknown@formal@none@1@S@'''Notes:'''@@@@1@1@@danf@17-8-2009 10240980@unknown@formal@none@1@S@It is the [[vowel]]s that differ most from region to region.@@@@1@11@@danf@17-8-2009 10240990@unknown@formal@none@1@S@Where symbols appear in pairs, the first corresponds to American English, [[General American]] accent; the second corresponds to British English, [[Received Pronunciation]].@@@@1@22@@danf@17-8-2009 10241000@unknown@formal@none@1@S@# American English lacks this sound; words with this sound are pronounced with {{IPA | /ɑ/}} or {{IPA | /ɔ/}}.@@@@1@20@@danf@17-8-2009 10241010@unknown@formal@none@1@S@See [[Phonological history of English low back vowels#Lot-cloth split|''Lot-cloth split'']].@@@@1@10@@danf@17-8-2009 10241020@unknown@formal@none@1@S@# Some dialects of North American English do not have this vowel.@@@@1@12@@danf@17-8-2009 10241030@unknown@formal@none@1@S@See [[phonological history of English low_back vowels#Cot-caught merger|''Cot-caught merger'']].@@@@1@9@@danf@17-8-2009 10241040@unknown@formal@none@1@S@# The North American variation of this sound is a [[r-colored vowel|rhotic vowel]].@@@@1@13@@danf@17-8-2009 10241050@unknown@formal@none@1@S@# Many speakers of North American English do not distinguish between these two unstressed vowels.@@@@1@15@@danf@17-8-2009 10241060@unknown@formal@none@1@S@For them, ''roses'' and ''Rosa's'' are pronounced the same, and the symbol usually used is [[schwa]] {{IPA | /ə/}}.@@@@1@19@@danf@17-8-2009 10241070@unknown@formal@none@1@S@# This sound is often transcribed with {{IPA | /i/}} or with {{IPA | /ɪ/}}.@@@@1@15@@danf@17-8-2009 10241080@unknown@formal@none@1@S@# The diphthongs {{IPA | /eɪ/}} and {{IPA | /oʊ/}} are monophthongal for many General American speakers, as {{IPA | /eː/}} and {{IPA | /oː/}}.@@@@1@25@@danf@17-8-2009 10241090@unknown@formal@none@1@S@# The letter <''U''> can represent either {{IPA|/u/}} or the [[iotation|iotated]] vowel {{IPA|/ju/}}.@@@@1@13@@danf@17-8-2009 10241100@unknown@formal@none@1@S@In BRP, if this iotated vowel {{IPA|/ju/}} occurs after {{IPA|/t/}}, {{IPA|/d/}}, {{IPA|/s/}} or {{IPA|/z/}}, it often triggers palatalization of the preceding consonant, turning it to {{IPA|/ʨ/}}, {{IPA|/ʥ/}}, {{IPA|/ɕ/}} and {{IPA|/ʑ/}} respectively, as in ''tune'', ''during'', ''sugar'', and ''azure''.@@@@1@38@@danf@17-8-2009 10241110@unknown@formal@none@1@S@In American English, palatalization does not generally happen unless the {{IPA|/ju/}} is followed by ''r'', with the result that {{IPA|/(t, d,s, z)jur/}} turn to {{IPA|/tʃɚ/}}, {{IPA|/dʒɚ/}}, {{IPA|/ʃɚ/}} and {{IPA|/ʒɚ/}} respectively, as in ''nature'', ''verdure'', ''sure'', and ''treasure''.@@@@1@37@@danf@17-8-2009 10241120@unknown@formal@none@1@S@# [[Vowel length]] plays a phonetic role in the majority of English dialects, and is said to be phonemic in a few dialects, such as [[Australian English]] and [[New Zealand English]].@@@@1@31@@danf@17-8-2009 10241130@unknown@formal@none@1@S@In certain dialects of the modern English language, for instance [[General American]], there is allophonic vowel length: vowel phonemes are realized as long vowel allophones before voiced consonant phonemes in the coda of a syllable.@@@@1@35@@danf@17-8-2009 10241140@unknown@formal@none@1@S@Before the [[Great Vowel Shift]], vowel length was phonemically contrastive.@@@@1@10@@danf@17-8-2009 10241150@unknown@formal@none@1@S@# This sound only occurs in non-rhotic accents.@@@@1@8@@danf@17-8-2009 10241160@unknown@formal@none@1@S@In some accents, this sound may be, instead of {{IPA|/ʊə/}}, {{IPA|/ɔ:/}}.@@@@1@11@@danf@17-8-2009 10241170@unknown@formal@none@1@S@See [[English-language vowel changes before historic r]].@@@@1@7@@danf@17-8-2009 10241180@unknown@formal@none@1@S@# This sound only occurs in non-rhotic accents.@@@@1@8@@danf@17-8-2009 10241190@unknown@formal@none@1@S@In some accents, the schwa offglide of {{IPA|/ɛə/}} may be dropped, monophthising and lengthening the sound to {{IPA|/ɛ:/}}.@@@@1@18@@danf@17-8-2009 10241200@unknown@formal@none@1@S@See also [[IPA chart for English dialects]] for more vowel charts.@@@@1@11@@danf@17-8-2009 10241210@unknown@formal@none@1@S@=== Consonants ===@@@@1@3@@danf@17-8-2009 10241220@unknown@formal@none@1@S@This is the English consonantal system using symbols from the [[International Phonetic Alphabet]] (IPA).@@@@1@14@@danf@17-8-2009 10241230@unknown@formal@none@1@S@# The [[velar nasal]] {{IPA | [ŋ]}} is a non-phonemic allophone of /n/ in some northerly British accents, appearing only before /k/ and /g/.@@@@1@24@@danf@17-8-2009 10241240@unknown@formal@none@1@S@In all other dialects it is a separate phoneme, although it only occurs in [[syllable coda]]s.@@@@1@16@@danf@17-8-2009 10241250@unknown@formal@none@1@S@# The [[alveolar tap]] {{IPA | [ɾ]}} is an allophone of /t/ and /d/ in unstressed syllables in [[North American English]] and [[Australian English]].@@@@1@24@@danf@17-8-2009 10241260@unknown@formal@none@1@S@This is the sound of ''tt'' or ''dd'' in the words ''latter'' and ''ladder'', which are homophones for many speakers of North American English.@@@@1@24@@danf@17-8-2009 10241270@unknown@formal@none@1@S@In some accents such as [[Scottish English]] and [[Indian English]] it replaces {{IPA|/ɹ/}}.@@@@1@13@@danf@17-8-2009 10241280@unknown@formal@none@1@S@This is the same sound represented by single ''r'' in most varieties of [[Spanish language|Spanish]].@@@@1@15@@danf@17-8-2009 10241290@unknown@formal@none@1@S@# In some dialects, such as [[Cockney]], the interdentals /θ/ and /ð/ are usually merged with /f/ and /v/, and in others, like [[African American Vernacular English]], /ð/ is merged with dental /d/.@@@@1@33@@danf@17-8-2009 10241300@unknown@formal@none@1@S@In some Irish varieties, /θ/ and /ð/ become the corresponding dental plosives, which then contrast with the usual alveolar plosives.@@@@1@20@@danf@17-8-2009 10241310@unknown@formal@none@1@S@# The sounds {{IPA | /ʃ/, /ʒ/, and /ɹ/}} are labialised in some dialects.@@@@1@14@@danf@17-8-2009 10241320@unknown@formal@none@1@S@Labialisation is never contrastive in initial position and therefore is sometimes not transcribed.@@@@1@13@@danf@17-8-2009 10241330@unknown@formal@none@1@S@Most speakers of [[General American]] realize (always rhoticized) as the [[retroflex approximant]] {{IPA|/ɻ/}}, whereas the same is realized in [[Scottish English]], etc. as the [[alveolar trill]].@@@@1@27@@danf@17-8-2009 10241340@unknown@formal@none@1@S@# The [[voiceless palatal fricative]] /ç/ is in most accents just an [[allophone]] of /h/ before /j/; for instance ''human'' /çjuːmən/.@@@@1@21@@danf@17-8-2009 10241350@unknown@formal@none@1@S@However, in some accents (see [[Phonological history of English consonant clusters|this]]), the /j/ is dropped, but the initial consonant is the same.@@@@1@22@@danf@17-8-2009 10241360@unknown@formal@none@1@S@# The [[voiceless velar fricative]] /x/ is used by Scottish or Welsh speakers of English for Scots/Gaelic words such as ''loch'' {{IPA | /lɒx/}} or by some speakers for loanwords from German and Hebrew like ''Bach'' {{IPA|/bax/}} or ''Chanukah'' /xanuka/. /x/ is also used in South African English.@@@@1@48@@danf@17-8-2009 10241370@unknown@formal@none@1@S@In some dialects such as [[Scouse]] ([[Liverpool]]) either {{IPA|[x]}} or the [[affricate consonant|affricate]] {{IPA|[kx]}} may be used as an [[allophone]] of /k/ in words such as ''docker'' {{IPA | [dɒkxə]}}.@@@@1@30@@danf@17-8-2009 10241380@unknown@formal@none@1@S@Most native speakers have a great deal of trouble pronouncing it correctly when learning a foreign language.@@@@1@17@@danf@17-8-2009 10241390@unknown@formal@none@1@S@Most speakers use the sounds [k] and [h] instead.@@@@1@9@@danf@17-8-2009 10241400@unknown@formal@none@1@S@# Voiceless w {{IPA | [ʍ]}} is found in Scottish and Irish English, as well as in some varieties of American, New Zealand, and English English.@@@@1@26@@danf@17-8-2009 10241410@unknown@formal@none@1@S@In most other dialects it is merged with /w/, in some dialects of Scots it is merged with /f/.@@@@1@19@@danf@17-8-2009 10241420@unknown@formal@none@1@S@==== Voicing and aspiration ====@@@@1@5@@danf@17-8-2009 10241430@unknown@formal@none@1@S@[[Voice (phonetics)|Voicing]] and [[aspiration (phonetics)|aspiration]] of [[stop consonant]]s in English depend on dialect and context, but a few general rules can be given:@@@@1@23@@danf@17-8-2009 10241440@unknown@formal@none@1@S@* Voiceless [[stop consonant|plosives]] and [[affricate consonant|affricates]] (/{{IPA | p}}/, /{{IPA | t}}/, /{{IPA | k}}/, and /{{IPA | tʃ}}/) are aspirated when they are word-initial or begin a stressed syllable — compare ''pin'' {{IPA | [pʰɪn]}} and ''spin'' {{IPA | [spɪn]}}, ''crap'' {{IPA | [kʰɹ̥æp]}} and ''scrap'' {{IPA | [skɹæp]}}.@@@@1@51@@danf@17-8-2009 10241450@unknown@formal@none@1@S@** In some dialects, aspiration extends to unstressed syllables as well.@@@@1@11@@danf@17-8-2009 10241460@unknown@formal@none@1@S@** In other dialects, such as [[Indian English]], all voiceless stops remain unaspirated.@@@@1@13@@danf@17-8-2009 10241470@unknown@formal@none@1@S@* Word-initial voiced plosives may be devoiced in some dialects.@@@@1@10@@danf@17-8-2009 10241480@unknown@formal@none@1@S@* Word-terminal voiceless plosives may be unreleased or accompanied by a glottal stop in some dialects (e.g. many varieties of [[American English]]) — examples: ''tap'' [{{IPA |tʰæp̚}}], ''sack'' [{{IPA |sæk̚}}].@@@@1@30@@danf@17-8-2009 10241490@unknown@formal@none@1@S@* Word-terminal voiced plosives may be devoiced in some dialects (e.g. some varieties of [[American English]]) — examples: ''sad'' [{{IPA |sæd̥}}], ''bag'' [{{IPA |bæɡ̊}}].@@@@1@24@@danf@17-8-2009 10241500@unknown@formal@none@1@S@In other dialects they are fully voiced in final position, but only partially voiced in initial position.@@@@1@17@@danf@17-8-2009 10241510@unknown@formal@none@1@S@=== Supra-segmental features ===@@@@1@4@@danf@17-8-2009 10241520@unknown@formal@none@1@S@==== Tone groups ====@@@@1@4@@danf@17-8-2009 10241530@unknown@formal@none@1@S@English is an [[Intonation (linguistics)|intonation language]]. This means that the [[pitch (music)|pitch]] of the [[human voice|voice]] is used [[Syntax|syntactically]], for example, to convey [[surprise (emotion)|surprise]] and [[irony]], or to change a [[sentence (linguistics)|statement]] into a [[question]].@@@@1@36@@danf@17-8-2009 10241540@unknown@formal@none@1@S@In English, intonation patterns are on groups of words, which are called tone groups, tone units, intonation groups or sense groups.@@@@1@21@@danf@17-8-2009 10241550@unknown@formal@none@1@S@Tone groups are said on a single breath and, as a consequence, are of limited length, more often being on average five words long or lasting roughly two seconds.@@@@1@29@@danf@17-8-2009 10241560@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10241570@unknown@formal@none@1@S@: -{{IPA | /duː juː niːd ˈɛnɪˌθɪŋ/}} ''Do you need anything?''@@@@1@11@@danf@17-8-2009 10241580@unknown@formal@none@1@S@: -{{IPA | /aɪ dəʊnt | nəʊ/}} ''I don't, no''@@@@1@10@@danf@17-8-2009 10241590@unknown@formal@none@1@S@: -{{IPA | /aɪ dəʊnt nəʊ/}} ''I don't know'' (contracted to, for example, -{{IPA | /aɪ dəʊnəʊ/}} or {{IPA | /aɪ dənəʊ/}} ''I dunno'' in fast or colloquial speech that de-emphasises the pause between don't and know even further)@@@@1@39@@danf@17-8-2009 10241600@unknown@formal@none@1@S@==== Characteristics of intonation ====@@@@1@5@@danf@17-8-2009 10241610@unknown@formal@none@1@S@English is a strongly stressed language, in that certain syllables, both within words and within phrases, get a relative prominence/loudness during pronunciation while the others do not.@@@@1@27@@danf@17-8-2009 10241620@unknown@formal@none@1@S@The former kind of syllables are said to be ''accentuated/stressed'' and the latter are ''unaccentuated/unstressed''.@@@@1@15@@danf@17-8-2009 10241630@unknown@formal@none@1@S@All good dictionaries of English mark the accentuated syllable(s) by either placing an apostrophe-like ( {{IPA | ˈ}} ) sign either before (as in [[International Phonetic Alphabet|IPA]], [[Oxford English Dictionary]], or [[Merriam-Webster]] dictionaries) or after (as in many other dictionaries) the syllable where the stress accent falls.@@@@1@47@@danf@17-8-2009 10241640@unknown@formal@none@1@S@Hence in a sentence, each tone group can be subdivided into syllables, which can either be stressed (strong) or unstressed (weak).@@@@1@21@@danf@17-8-2009 10241650@unknown@formal@none@1@S@The stressed syllable is called the nuclear syllable.@@@@1@8@@danf@17-8-2009 10241660@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10241670@unknown@formal@none@1@S@: ''That | was | the | '''best''' | thing | you | could | have | '''done'''!''@@@@1@18@@danf@17-8-2009 10241680@unknown@formal@none@1@S@Here, all syllables are unstressed, except the syllables/words ''best'' and ''done'', which are stressed.@@@@1@14@@danf@17-8-2009 10241690@unknown@formal@none@1@S@''Best'' is stressed harder and, therefore, is the nuclear syllable.@@@@1@10@@danf@17-8-2009 10241700@unknown@formal@none@1@S@The nuclear syllable carries the main point the speaker wishes to make.@@@@1@12@@danf@17-8-2009 10241710@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10241720@unknown@formal@none@1@S@: ''John'' had not stolen that money. (...@@@@1@8@@danf@17-8-2009 10241730@unknown@formal@none@1@S@Someone else had.)@@@@1@3@@danf@17-8-2009 10241740@unknown@formal@none@1@S@: John ''had not'' stolen that money. (...@@@@1@8@@danf@17-8-2009 10241750@unknown@formal@none@1@S@Someone said he had. or ...@@@@1@6@@danf@17-8-2009 10241760@unknown@formal@none@1@S@Not at that time, but later he did.)@@@@1@8@@danf@17-8-2009 10241770@unknown@formal@none@1@S@: John had not ''stolen'' that money. (...@@@@1@8@@danf@17-8-2009 10241780@unknown@formal@none@1@S@He acquired the money by some other means.)@@@@1@8@@danf@17-8-2009 10241790@unknown@formal@none@1@S@: John had not stolen ''that'' money. (...@@@@1@8@@danf@17-8-2009 10241800@unknown@formal@none@1@S@He had stolen some other money.)@@@@1@6@@danf@17-8-2009 10241810@unknown@formal@none@1@S@: John had not stolen that ''money''. (...@@@@1@8@@danf@17-8-2009 10241820@unknown@formal@none@1@S@He had stolen something else.)@@@@1@5@@danf@17-8-2009 10241830@unknown@formal@none@1@S@Also@@@@1@1@@danf@17-8-2009 10241840@unknown@formal@none@1@S@: ''I'' did not tell her that. (...@@@@1@8@@danf@17-8-2009 10241850@unknown@formal@none@1@S@Someone else told her)@@@@1@4@@danf@17-8-2009 10241860@unknown@formal@none@1@S@: I ''did not'' tell her that. (...@@@@1@8@@danf@17-8-2009 10241870@unknown@formal@none@1@S@You said I did. or ... but now I will)@@@@1@10@@danf@17-8-2009 10241880@unknown@formal@none@1@S@: I did not ''tell'' her that. (...@@@@1@8@@danf@17-8-2009 10241890@unknown@formal@none@1@S@I did not say it; she could have inferred it, etc)@@@@1@11@@danf@17-8-2009 10241900@unknown@formal@none@1@S@: I did not tell ''her'' that. (...@@@@1@8@@danf@17-8-2009 10241910@unknown@formal@none@1@S@I told someone else)@@@@1@4@@danf@17-8-2009 10241920@unknown@formal@none@1@S@: I did not tell her ''that''. (...@@@@1@8@@danf@17-8-2009 10241930@unknown@formal@none@1@S@I told her something else)@@@@1@5@@danf@17-8-2009 10241940@unknown@formal@none@1@S@This can also be used to express emotion:@@@@1@8@@danf@17-8-2009 10241950@unknown@formal@none@1@S@: ''Oh'' really? (...I did not know that)@@@@1@8@@danf@17-8-2009 10241960@unknown@formal@none@1@S@: Oh ''really''? (...I disbelieve you. or ...@@@@1@8@@danf@17-8-2009 10241970@unknown@formal@none@1@S@That's blatantly obvious)@@@@1@3@@danf@17-8-2009 10241980@unknown@formal@none@1@S@The nuclear syllable is spoken more loudly than the others and has a characteristic '''change of pitch'''.@@@@1@17@@danf@17-8-2009 10241990@unknown@formal@none@1@S@The changes of pitch most commonly encountered in English are the '''rising pitch''' and the '''falling pitch''', although the '''fall-rising pitch''' and/or the '''rise-falling pitch''' are sometimes used.@@@@1@28@@danf@17-8-2009 10242000@unknown@formal@none@1@S@In this opposition between falling and rising pitch, which plays a larger role in English than in most other languages, falling pitch conveys certainty and rising pitch uncertainty.@@@@1@28@@danf@17-8-2009 10242010@unknown@formal@none@1@S@This can have a crucial impact on meaning, specifically in relation to polarity, the positive–negative opposition; thus, falling pitch means "polarity known", while rising pitch means "polarity unknown".@@@@1@28@@danf@17-8-2009 10242020@unknown@formal@none@1@S@This underlies the rising pitch of yes/no questions.@@@@1@8@@danf@17-8-2009 10242030@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10242040@unknown@formal@none@1@S@: ''When do you want to be paid?''@@@@1@8@@danf@17-8-2009 10242050@unknown@formal@none@1@S@: ''Now?''@@@@1@2@@danf@17-8-2009 10242060@unknown@formal@none@1@S@(Rising pitch.@@@@1@2@@danf@17-8-2009 10242070@unknown@formal@none@1@S@In this case, it denotes a question: "Can I be paid now?" or "Do you desire to pay now?")@@@@1@19@@danf@17-8-2009 10242080@unknown@formal@none@1@S@: ''Now.''@@@@1@2@@danf@17-8-2009 10242090@unknown@formal@none@1@S@(Falling pitch.@@@@1@2@@danf@17-8-2009 10242100@unknown@formal@none@1@S@In this case, it denotes a statement: "I choose to be paid now.")@@@@1@13@@danf@17-8-2009 10242110@unknown@formal@none@1@S@== Grammar ==@@@@1@3@@danf@17-8-2009 10242120@unknown@formal@none@1@S@English grammar has minimal [[inflection]] compared with most other [[Indo-European languages]].@@@@1@11@@danf@17-8-2009 10242130@unknown@formal@none@1@S@For example, Modern English, unlike Modern German or Dutch and the [[Romance languages]], lacks [[grammatical gender]] and [[Agreement (linguistics)|adjectival agreement]].@@@@1@20@@danf@17-8-2009 10242140@unknown@formal@none@1@S@[[Grammatical case|Case]] marking has almost disappeared from the language and mainly survives in [[pronoun]]s.@@@@1@14@@danf@17-8-2009 10242150@unknown@formal@none@1@S@The patterning of [[Strong inflection|strong]] (e.g. ''speak/spoke/spoken'') versus [[Germanic weak verb|weak verbs]] inherited from its Germanic origins has declined in importance in modern English, and the remnants of inflection (such as [[plural]] marking) have become more regular.@@@@1@37@@danf@17-8-2009 10242160@unknown@formal@none@1@S@At the same time, the language has become more [[Isolating language|analytic]], and has developed features such as [[modal verb]]s and [[word order]] as resources for conveying meaning.@@@@1@27@@danf@17-8-2009 10242170@unknown@formal@none@1@S@[[Auxiliary verb]]s mark constructions such as questions, negative polarity, the [[Grammatical voice|passive voice]] and progressive [[grammatical aspect|aspect]].@@@@1@17@@danf@17-8-2009 10242180@unknown@formal@none@1@S@== Vocabulary ==@@@@1@3@@danf@17-8-2009 10242190@unknown@formal@none@1@S@The English vocabulary has changed considerably over the centuries.@@@@1@9@@danf@17-8-2009 10242200@unknown@formal@none@1@S@Like many languages deriving from [[Proto-Indo-European language|Proto-Indo-European]] (PIE), many of the most common words in English can trace back their origin (through the Germanic branch) to PIE.@@@@1@27@@danf@17-8-2009 10242210@unknown@formal@none@1@S@Such words include the basic pronouns ''I'', from [[Old English language|Old English]] ''ic'', (cf. Latin ''ego'', Greek ''ego'', Sanskrit ''aham''), ''me'' (cf. Latin ''me'', Greek ''eme'', Sanskrit ''mam''), numbers (e.g. ''one'', ''two'', ''three'', cf. Latin ''unus, duo, tres'', Greek ''oinos'' "ace (on dice)", ''duo, treis''), common family relationships such as mother, father, brother, sister etc (cf. Greek "meter", Latin "mater", Sanskrit "matṛ"; ''mother''), names of many animals (cf. Sankrit ''mus'', Greek ''mys'', Latin ''mus''; ''mouse''), and many common verbs (cf. Greek ''gignōmi'', Latin ''gnoscere'', Hittite ''kanes'';'' to know'').@@@@1@88@@danf@17-8-2009 10242220@unknown@formal@none@1@S@Germanic words (generally words of Old English or to a lesser extent Norse origin) tend to be shorter than the Latinate words of English, and more common in ordinary speech.@@@@1@30@@danf@17-8-2009 10242230@unknown@formal@none@1@S@This includes nearly all the basic pronouns, prepositions, conjunctions, modal verbs etc. that form the basis of English syntax and grammar.@@@@1@21@@danf@17-8-2009 10242240@unknown@formal@none@1@S@The longer Latinate words are often regarded as more elegant or educated.@@@@1@12@@danf@17-8-2009 10242250@unknown@formal@none@1@S@However, the excessive use of Latinate words is considered at times to be either pretentious or an attempt to [[obfuscation|obfuscate]] an issue.@@@@1@22@@danf@17-8-2009 10242260@unknown@formal@none@1@S@[[George Orwell]]'s [[essay]] "[[Politics and the English Language]]" is critical of this, as well as other perceived misuse of the language.@@@@1@21@@danf@17-8-2009 10242270@unknown@formal@none@1@S@An English speaker is in many cases able to choose between Germanic and Latinate [[synonym]]s: ''come'' or ''arrive''; ''sight'' or ''vision''; ''freedom'' or ''liberty''.@@@@1@24@@danf@17-8-2009 10242280@unknown@formal@none@1@S@In some cases there is a choice between a Germanic derived word (''oversee''), a Latin derived word (''supervise''), and a French word derived from the same Latin word (''survey'').@@@@1@29@@danf@17-8-2009 10242290@unknown@formal@none@1@S@Such synonyms harbor a variety of different meanings and nuances, enabling the speaker to express fine variations or shades of thought.@@@@1@21@@danf@17-8-2009 10242300@unknown@formal@none@1@S@Familiarity with the [[etymology]] of groups of synonyms can give English speakers greater control over their [[Register (sociolinguistics)|linguistic register]].@@@@1@19@@danf@17-8-2009 10242310@unknown@formal@none@1@S@See: [[List of Germanic and Latinate equivalents in English]].@@@@1@9@@danf@17-8-2009 10242320@unknown@formal@none@1@S@An exception to this and a peculiarity perhaps unique to English is that the nouns for meats are commonly different from, and unrelated to, those for the animals from which they are produced, the animal commonly having a Germanic name and the meat having a French-derived one.@@@@1@47@@danf@17-8-2009 10242330@unknown@formal@none@1@S@Examples include: ''[[deer]]'' and ''[[venison]]''; ''[[cattle|cow]]'' and ''[[beef]]''; ''swine''/''[[pig]]'' and ''[[pork]]'', or ''[[domestic sheep|sheep]]'' and ''[[lamb and mutton|mutton]]''.@@@@1@18@@danf@17-8-2009 10242340@unknown@formal@none@1@S@This is assumed to be a result of the aftermath of the Norman invasion, where a French-speaking elite were the consumers of the meat, produced by Anglo-Saxon lower classes.@@@@1@29@@danf@17-8-2009 10242350@unknown@formal@none@1@S@Since the majority of words used in informal settings will normally be Germanic, such words are often the preferred choices when a speaker wishes to make a point in an argument in a very direct way.@@@@1@36@@danf@17-8-2009 10242360@unknown@formal@none@1@S@A majority of Latinate words (or at least a majority of content words) will normally be used in more formal speech and writing, such as a [[court]]room or an [[encyclopedia]] article.@@@@1@31@@danf@17-8-2009 10242370@unknown@formal@none@1@S@However, there are other Latinate words that are used normally in everyday speech and do not sound formal; these are mainly words for concepts that no longer have Germanic words, and are generally assimilated better and in many cases do not appear Latinate.@@@@1@43@@danf@17-8-2009 10242380@unknown@formal@none@1@S@For instance, the words ''mountain'', ''valley'', ''river'', ''aunt'', ''uncle'', ''move'', ''use'', ''push'' and ''stay'' are all Latinate.@@@@1@17@@danf@17-8-2009 10242390@unknown@formal@none@1@S@English easily accepts technical terms into common usage and often imports new words and phrases.@@@@1@15@@danf@17-8-2009 10242400@unknown@formal@none@1@S@Examples of this phenomenon include: ''[[HTTP cookie|cookie]]'', ''[[Internet]]'' and ''[[Uniform Resource Locator|URL]]'' (technical terms), as well as ''[[genre]]'', ''[[über]]'', ''[[lingua franca]]'' and ''amigo'' (imported words/phrases from French, German, modern Latin, and Spanish, respectively).@@@@1@33@@danf@17-8-2009 10242410@unknown@formal@none@1@S@In addition, [[slang]] often provides new meanings for old words and phrases.@@@@1@12@@danf@17-8-2009 10242420@unknown@formal@none@1@S@In fact, this fluidity is so pronounced that a distinction often needs to be made between formal forms of English and contemporary usage.@@@@1@23@@danf@17-8-2009 10242430@unknown@formal@none@1@S@See also: [[sociolinguistics]].@@@@1@3@@danf@17-8-2009 10242440@unknown@formal@none@1@S@=== Number of words in English ===@@@@1@7@@danf@17-8-2009 10242450@unknown@formal@none@1@S@The ''General Explanations'' at the beginning of the ''Oxford English Dictionary'' states:@@@@1@12@@danf@17-8-2009 10242460@unknown@formal@none@1@S@The vocabulary of English is undoubtedly vast, but assigning a specific number to its size is more a matter of definition than of calculation.@@@@1@24@@danf@17-8-2009 10242470@unknown@formal@none@1@S@Unlike other languages, such as [[Académie française|French]], [[List of language regulators|German]], [[Real Academia Española|Spanish]] and [[Accademia della Crusca|Italian]] there is no [[List of language regulators|Academy]] to define officially accepted words and spellings.@@@@1@32@@danf@17-8-2009 10242480@unknown@formal@none@1@S@[[Neologism]]s are coined regularly in medicine, science and technology and other fields, and new [[slang]] is constantly developed.@@@@1@18@@danf@17-8-2009 10242490@unknown@formal@none@1@S@Some of these new words enter wide usage; others remain restricted to small circles.@@@@1@14@@danf@17-8-2009 10242500@unknown@formal@none@1@S@Foreign words used in immigrant communities often make their way into wider English usage.@@@@1@14@@danf@17-8-2009 10242510@unknown@formal@none@1@S@Archaic, dialectal, and regional words might or might not be widely considered as "English".@@@@1@14@@danf@17-8-2009 10242520@unknown@formal@none@1@S@The ''[[Oxford English Dictionary]],'' 2nd edition ''(OED2)'' includes over 600,000 definitions, following a rather inclusive policy:@@@@1@16@@danf@17-8-2009 10242530@unknown@formal@none@1@S@The editors of ''[[Webster's Dictionary|Webster's Third New International Dictionary, Unabridged]]'' (475,000 main headwords) in their preface, estimate the number to be much higher.@@@@1@23@@danf@17-8-2009 10242540@unknown@formal@none@1@S@It is estimated that about 25,000 words are added to the language each year.@@@@1@14@@danf@17-8-2009 10242550@unknown@formal@none@1@S@=== Word origins ===@@@@1@4@@danf@17-8-2009 10242560@unknown@formal@none@1@S@One of the consequences of the French influence is that the vocabulary of English is, to a certain extent, divided between those words which are [[Germanic languages|Germanic]] (mostly West Germanic, with a smaller influence from the North Germanic branch) and those which are "Latinate" (Latin-derived, either directly or from Norman French or other Romance languages).@@@@1@55@@danf@17-8-2009 10242570@unknown@formal@none@1@S@Numerous sets of statistics have been proposed to demonstrate the origins of English vocabulary.@@@@1@14@@danf@17-8-2009 10242580@unknown@formal@none@1@S@None, as yet, is considered definitive by most linguists.@@@@1@9@@danf@17-8-2009 10242590@unknown@formal@none@1@S@A computerised survey of about 80,000 words in the old ''Shorter Oxford Dictionary'' (3rd ed.) was published in ''Ordered Profusion'' by Thomas Finkenstaedt and Dieter Wolff (1973) that estimated the origin of English words as follows:@@@@1@36@@danf@17-8-2009 10242600@unknown@formal@none@1@S@*''[[Langues d'oïl|Langue d'oïl]]'', including French and [[Old Norman]]: [[List of English words of French origin|28.3%]]@@@@1@15@@danf@17-8-2009 10242610@unknown@formal@none@1@S@*Latin, including modern scientific and technical Latin: 28.24%@@@@1@8@@danf@17-8-2009 10242620@unknown@formal@none@1@S@*Other [[Germanic languages]] (including words directly inherited from [[Old English language|Old English]]): 25%@@@@1@13@@danf@17-8-2009 10242630@unknown@formal@none@1@S@*Greek: 5.32%@@@@1@2@@danf@17-8-2009 10242640@unknown@formal@none@1@S@*No etymology given: 4.03%@@@@1@4@@danf@17-8-2009 10242650@unknown@formal@none@1@S@*Derived from proper names: 3.28%@@@@1@5@@danf@17-8-2009 10242660@unknown@formal@none@1@S@*All other languages contributed less than 1%@@@@1@7@@danf@17-8-2009 10242670@unknown@formal@none@1@S@A survey by [[Joseph M. Williams]] in ''Origins of the English Language'' of 10,000 words taken from several thousand business letters gave this set of statistics:@@@@1@26@@danf@17-8-2009 10242680@unknown@formal@none@1@S@*French (langue d'oïl): 41%@@@@1@4@@danf@17-8-2009 10242690@unknown@formal@none@1@S@*"Native" English: 33%@@@@1@3@@danf@17-8-2009 10242700@unknown@formal@none@1@S@*Latin: 15%@@@@1@2@@danf@17-8-2009 10242710@unknown@formal@none@1@S@*Danish: 2%@@@@1@2@@danf@17-8-2009 10242720@unknown@formal@none@1@S@*Dutch: 1%@@@@1@2@@danf@17-8-2009 10242730@unknown@formal@none@1@S@*Other: 10%@@@@1@2@@danf@17-8-2009 10242740@unknown@formal@none@1@S@However, 83% of the 1,000 most-common, and all of the 100 most-common English words are Germanic.@@@@1@16@@danf@17-8-2009 10242750@unknown@formal@none@1@S@==== Dutch origins ====@@@@1@4@@danf@17-8-2009 10242760@unknown@formal@none@1@S@Words describing the navy, types of ships, and other objects or activities on the water are often from Dutch origin.@@@@1@20@@danf@17-8-2009 10242770@unknown@formal@none@1@S@''Yacht'' (''jacht'') and ''cruiser'' (''kruiser'') are examples.@@@@1@7@@danf@17-8-2009 10242780@unknown@formal@none@1@S@==== French origins ====@@@@1@4@@danf@17-8-2009 10242790@unknown@formal@none@1@S@There are many [[List of English words of French origin|words of French origin in English]], such as ''competition'', ''art'', ''table'', ''publicity'', ''police'', ''role'', ''routine'', ''machine'', ''force'', and many others that have been and are being [[anglicisation|anglicised]]; they are now pronounced according to English rules of [[phonology]], rather than French.@@@@1@49@@danf@17-8-2009 10242800@unknown@formal@none@1@S@A large portion of English vocabulary is of French or [[Langues d'oïl]] origin, most derived from, or transmitted via, the [[Anglo-Norman language|Anglo-Norman]] spoken by the [[upper class]]es in [[England]] for several hundred years after the [[Norman conquest of England]].@@@@1@39@@danf@17-8-2009 10242810@unknown@formal@none@1@S@== Writing system ==@@@@1@4@@danf@17-8-2009 10242820@unknown@formal@none@1@S@English has been written using the [[Latin alphabet]] since around the ninth century.@@@@1@13@@danf@17-8-2009 10242830@unknown@formal@none@1@S@(Before that, Old English had been written using [[Anglo-Saxon runes]].)@@@@1@10@@danf@17-8-2009 10242840@unknown@formal@none@1@S@The spelling system, or [[orthography]], is multilayered, with elements of French, Latin and Greek spelling on top of the native Germanic system; it has grown to vary significantly from the [[phonology]] of the language.@@@@1@34@@danf@17-8-2009 10242850@unknown@formal@none@1@S@The spelling of words often diverges considerably from how they are spoken.@@@@1@12@@danf@17-8-2009 10242860@unknown@formal@none@1@S@Though letters and sounds may not correspond in isolation, spelling rules that take into account syllable structure, phonetics, and accents are 75% or more reliable.@@@@1@25@@danf@17-8-2009 10242870@unknown@formal@none@1@S@Some phonics spelling advocates claim that English is more than 80% phonetic.@@@@1@12@@danf@17-8-2009 10242880@unknown@formal@none@1@S@In general, [[history of the English language|the English language]], being the product of many other languages and having only been codified orthographically in the 16th century, has fewer consistent relationships between sounds and letters than many other languages.@@@@1@38@@danf@17-8-2009 10242890@unknown@formal@none@1@S@The consequence of this orthographic history is that reading can be challenging.@@@@1@12@@danf@17-8-2009 10242900@unknown@formal@none@1@S@It takes longer for students to become completely fluent readers of English than of many other languages, including French, Greek, and Spanish.@@@@1@22@@danf@17-8-2009 10242910@unknown@formal@none@1@S@=== Basic sound-letter correspondence ===@@@@1@5@@danf@17-8-2009 10242920@unknown@formal@none@1@S@Only the consonant letters are pronounced in a relatively regular way:@@@@1@11@@danf@17-8-2009 10242930@unknown@formal@none@1@S@=== Written accents ===@@@@1@4@@danf@17-8-2009 10242940@unknown@formal@none@1@S@Unlike most other Germanic languages, English has almost no [[diacritic]]s except in foreign [[loanword]]s (like the [[acute accent]] in ''café''), and in the uncommon use of a [[diaeresis]] mark (often in formal writing) to indicate that two vowels are pronounced separately, rather than as one sound (e.g. ''naïve, Zoë'').@@@@1@49@@danf@17-8-2009 10242950@unknown@formal@none@1@S@It is almost always acceptable to leave out the marks, especially in digital communications where the [[QWERTY]] keyboard lacks any marked letters, but it depends on the context where the word is used.@@@@1@33@@danf@17-8-2009 10242960@unknown@formal@none@1@S@Some English words retain the diacritic to distinguish them from others, such as ''[[Animé (oleo-resin)|animé]], [[Investigative journalism|exposé]], [[Lamé (fencing)|lamé]], [[öre]], [[øre]], [[pâté]], [[piqué]],'' and ''[[rosé]]'', though these are sometimes also dropped (''[[résumé]]/resumé'' is usually spelled ''resume'' in the United States).@@@@1@40@@danf@17-8-2009 10242970@unknown@formal@none@1@S@There are loan words which occasionally use a diacritic to represent their pronunciation that is not in the original word, such as ''maté'', from Spanish ''[[yerba mate]]'', following the French usage, but they are extremely rare.@@@@1@36@@danf@17-8-2009 10242980@unknown@formal@none@1@S@== Formal written English ==@@@@1@5@@danf@17-8-2009 10242990@unknown@formal@none@1@S@A version of the language almost universally agreed upon by educated English speakers around the world is called [[formal written English]].@@@@1@21@@danf@17-8-2009 10243000@unknown@formal@none@1@S@It takes virtually the same form no matter where in the English-speaking world it is written.@@@@1@16@@danf@17-8-2009 10243010@unknown@formal@none@1@S@In spoken English, by contrast, there are a vast number of differences between [[dialect]]s, [[Accent (linguistics)|accents]], and varieties of [[slang]], colloquial and regional expressions.@@@@1@24@@danf@17-8-2009 10243020@unknown@formal@none@1@S@In spite of this, local variations in the formal written version of the language are quite limited, being restricted largely to the [[American and British English spelling differences|spelling differences between British and American English]].@@@@1@34@@danf@17-8-2009 10243030@unknown@formal@none@1@S@== Basic and simplified versions ==@@@@1@6@@danf@17-8-2009 10243040@unknown@formal@none@1@S@To make English easier to read, there are some simplified versions of the language.@@@@1@14@@danf@17-8-2009 10243050@unknown@formal@none@1@S@One basic version is named ''[[Basic English]]'', a [[constructed language]] with a small number of words created by [[Charles Kay Ogden]] and described in his book ''Basic English: A General Introduction with Rules and Grammar'' (1930).@@@@1@36@@danf@17-8-2009 10243060@unknown@formal@none@1@S@The language is based on a simplified version of English.@@@@1@10@@danf@17-8-2009 10243070@unknown@formal@none@1@S@Ogden said that it would take seven years to learn English, seven months for [[Esperanto]], and seven weeks for Basic English, comparable with [[Ido]].@@@@1@24@@danf@17-8-2009 10243080@unknown@formal@none@1@S@Thus Basic English is used by companies who need to make complex books for international use, and by language schools that need to give people some knowledge of English in a short time.@@@@1@33@@danf@17-8-2009 10243090@unknown@formal@none@1@S@Ogden did not put any words into Basic English that could be said with a few other words and he worked to make the words work for speakers of any other language.@@@@1@32@@danf@17-8-2009 10243100@unknown@formal@none@1@S@He put his set of words through a large number of tests and adjustments.@@@@1@14@@danf@17-8-2009 10243110@unknown@formal@none@1@S@He also made the grammar simpler, but tried to keep the grammar normal for English users.@@@@1@16@@danf@17-8-2009 10243120@unknown@formal@none@1@S@The concept gained its greatest publicity just after the [[World War II|Second World War]] as a tool for world peace.@@@@1@20@@danf@17-8-2009 10243130@unknown@formal@none@1@S@Although it was not built into a program, similar simplifications were devised for various international uses.@@@@1@16@@danf@17-8-2009 10243140@unknown@formal@none@1@S@Another version, [[Simplified English]], exists, which is a [[Controlled natural language|controlled language]] originally developed for [[aerospace]] industry maintenance manuals.@@@@1@19@@danf@17-8-2009 10243150@unknown@formal@none@1@S@It offers a carefully limited and standardised subset of English.@@@@1@10@@danf@17-8-2009 10243160@unknown@formal@none@1@S@Simplified English has a lexicon of approved words and those words can only be used in certain ways.@@@@1@18@@danf@17-8-2009 10243170@unknown@formal@none@1@S@For example, the word ''close'' can be used in the phrase "Close the door" but not "do not go close to the landing gear".@@@@1@24@@danf@17-8-2009 10250010@unknown@formal@none@1@S@
Esperanto
@@@@1@1@@danf@17-8-2009 10250020@unknown@formal@none@1@S@is by far the most widely spoken [[constructed language|constructed]] [[international auxiliary language]] in the world.@@@@1@15@@danf@17-8-2009 10250030@unknown@formal@none@1@S@Its name derives from ''Doktoro Esperanto,'' the [[pseudonym]] under which [[L. L. Zamenhof]] published the first book detailing Esperanto, the ''[[Unua Libro]],'' in 1887.@@@@1@24@@danf@17-8-2009 10250040@unknown@formal@none@1@S@The word ''esperanto'' means 'one who hopes' in the language itself.@@@@1@11@@danf@17-8-2009 10250050@unknown@formal@none@1@S@Zamenhof's goal was to create an easy and flexible language that would serve as a universal [[second language]] to foster peace and international understanding.@@@@1@24@@danf@17-8-2009 10250060@unknown@formal@none@1@S@Esperanto has had continuous usage by a community estimated at between 100,000 and 2 million speakers for over a century.@@@@1@20@@danf@17-8-2009 10250070@unknown@formal@none@1@S@By most estimates, there are approximately one thousand [[Native Esperanto speakers|native speakers]].@@@@1@12@@danf@17-8-2009 10250080@unknown@formal@none@1@S@However, no country has adopted the language [[official language|officially]].@@@@1@9@@danf@17-8-2009 10250090@unknown@formal@none@1@S@Today, Esperanto is employed in world travel, correspondence, cultural exchange, conventions, literature, language instruction, television, and radio broadcasting.@@@@1@18@@danf@17-8-2009 10250100@unknown@formal@none@1@S@Also, there is an [[Esperanto Wikipedia]] that contains over 100,000 articles as of June 2008.@@@@1@15@@danf@17-8-2009 10250110@unknown@formal@none@1@S@There is evidence that [[Propaedeutic value of Esperanto|learning Esperanto may provide a good foundation for learning languages in general]].@@@@1@19@@danf@17-8-2009 10250120@unknown@formal@none@1@S@Some state education systems offer basic instruction and elective courses in Esperanto.@@@@1@12@@danf@17-8-2009 10250130@unknown@formal@none@1@S@Esperanto is also the language of instruction in one university, the [[Akademio Internacia de la Sciencoj San Marino|Akademio Internacia de la Sciencoj]] in [[San Marino]].@@@@1@25@@danf@17-8-2009 10250140@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10250150@unknown@formal@none@1@S@Esperanto was developed in the late 1870s and early 1880s by [[ophthalmology|ophthalmologist]] [[L. L. Zamenhof|Dr. Ludovic Lazarus Zamenhof]], an [[Ashkenazi Jew]] from [[Bialystok]], now in [[Poland]] and previously in the [[Polish-Lithuanian Commonwealth]], but at the time part of the [[Russian Empire]].@@@@1@41@@danf@17-8-2009 10250160@unknown@formal@none@1@S@After some ten years of development, which Zamenhof spent translating literature into the language as well as writing original [[prose]] and [[Poetry|verse]], the [[Unua Libro|first book of Esperanto grammar]] was published in [[Warsaw]] in July 1887.@@@@1@36@@danf@17-8-2009 10250170@unknown@formal@none@1@S@The number of speakers grew rapidly over the next few decades, at first primarily in the [[Russian empire]] and [[Eastern Europe]], then in [[Western Europe]], the [[Americas]], [[China]], and [[Japan]].@@@@1@30@@danf@17-8-2009 10250180@unknown@formal@none@1@S@In the early years, speakers of Esperanto kept in contact primarily through correspondence and [[magazine|periodicals]], but in 1905 the first [[World Congress of Esperanto|world congress of Esperanto speakers]] was held in [[Boulogne-sur-Mer]], [[France]].@@@@1@33@@danf@17-8-2009 10250190@unknown@formal@none@1@S@Since then world congresses have been held in different countries every year, except during the two [[world war|World Wars]].@@@@1@19@@danf@17-8-2009 10250200@unknown@formal@none@1@S@Since the Second World War, they have been attended by an average of over 2000 and up to 6000 people.@@@@1@20@@danf@17-8-2009 10250210@unknown@formal@none@1@S@===Relation to 20th-century totalitarianism===@@@@1@4@@danf@17-8-2009 10250220@unknown@formal@none@1@S@As a potential vehicle for international understanding, Esperanto attracted the suspicion of many [[totalitarian]] states.@@@@1@15@@danf@17-8-2009 10250230@unknown@formal@none@1@S@The situation was especially pronounced in [[Nazi Germany]] and in the [[Soviet Union]] under [[Joseph Stalin]].@@@@1@16@@danf@17-8-2009 10250240@unknown@formal@none@1@S@In Germany, there was additional motivation to persecute Esperanto because Zamenhof was a Jew.@@@@1@14@@danf@17-8-2009 10250250@unknown@formal@none@1@S@In his work ''[[Mein Kampf]],'' [[Hitler]] mentioned Esperanto as an example of a language that would be used by an [[International Jewry|International]] [[Jewish conspiracy|Jewish Conspiracy]] once they achieved [[world domination]].@@@@1@30@@danf@17-8-2009 10250260@unknown@formal@none@1@S@[[Esperantist]]s were executed during [[the Holocaust]], with Zamenhof's family in particular singled out for execution.@@@@1@15@@danf@17-8-2009 10250270@unknown@formal@none@1@S@In the early years of the Soviet Union, Esperanto was given a measure of government support, and an officially recognized Soviet Esperanto Association came into being.@@@@1@26@@danf@17-8-2009 10250280@unknown@formal@none@1@S@However, in 1937, Stalin reversed this policy.@@@@1@7@@danf@17-8-2009 10250290@unknown@formal@none@1@S@He denounced Esperanto as "the language of spies" and had Esperantists executed.@@@@1@12@@danf@17-8-2009 10250300@unknown@formal@none@1@S@The use of Esperanto remained illegal until 1956.@@@@1@8@@danf@17-8-2009 10250310@unknown@formal@none@1@S@==Official use==@@@@1@2@@danf@17-8-2009 10250320@unknown@formal@none@1@S@Esperanto has never been an official language of any recognized country.@@@@1@11@@danf@17-8-2009 10250330@unknown@formal@none@1@S@However, there were plans at the beginning of the 20th century to establish [[Moresnet|Neutral Moresnet]] as the world's first Esperanto state.@@@@1@21@@danf@17-8-2009 10250340@unknown@formal@none@1@S@In China, there was talk in some circles after the 1911 [[Xinhai Revolution]] about officially replacing [[Chinese language|Chinese]] with Esperanto as a means to dramatically bring the country into the twentieth century, though this policy proved untenable.@@@@1@37@@danf@17-8-2009 10250350@unknown@formal@none@1@S@In the summer of 1924, the [[American Radio Relay League]] adopted Esperanto as its official [[international auxiliary language]], and hoped that the language would be used by [[Amateur radio|radio amateurs]] in international communications, but its actual use for radio communications was negligible.@@@@1@42@@danf@17-8-2009 10250360@unknown@formal@none@1@S@In addition, the self-proclaimed [[artificial island]] [[micronation]] of [[Republic of Rose Island|Rose Island]] used Esperanto as its official language in 1968.@@@@1@21@@danf@17-8-2009 10250370@unknown@formal@none@1@S@Esperanto is the working language of several [[non-profit organization|non-profit]] international organizations such as the ''[[Sennacieca Asocio Tutmonda]]'', but most others are specifically Esperanto organizations.@@@@1@24@@danf@17-8-2009 10250380@unknown@formal@none@1@S@The largest of these, the [[World Esperanto Association]], has an official consultative relationship with the [[United Nations]] and [[UNESCO]].@@@@1@19@@danf@17-8-2009 10250390@unknown@formal@none@1@S@The U.S. Army has published military phrasebooks in Esperanto, to be used in [[Military simulation|wargames]] by mock enemy forces.@@@@1@19@@danf@17-8-2009 10250400@unknown@formal@none@1@S@Esperanto is also the first language of teaching and administration of the [[Akademio Internacia de la Sciencoj San Marino|International Academy of Sciences San Marino]], which is sometimes called an "Esperanto University".@@@@1@31@@danf@17-8-2009 10250410@unknown@formal@none@1@S@== Linguistic properties ==@@@@1@4@@danf@17-8-2009 10250420@unknown@formal@none@1@S@=== Classification ===@@@@1@3@@danf@17-8-2009 10250430@unknown@formal@none@1@S@As a [[constructed language]], Esperanto is not [[Genealogy|genealogically]] related to any [[ethnic group|ethnic]] language.@@@@1@14@@danf@17-8-2009 10250440@unknown@formal@none@1@S@It has been described as "a language [[lexicon|lexically]] predominantly [[Romance languages|Romanic]], [[morphology (linguistics)|morphologically]] intensively [[agglutination|agglutinative]] and to a certain degree [[isolating languages|isolating]] in character".@@@@1@24@@danf@17-8-2009 10250450@unknown@formal@none@1@S@The [[phonology]], [[grammar]], [[vocabulary]], and [[semantics]] are based on the western [[Indo-European languages]].@@@@1@13@@danf@17-8-2009 10250460@unknown@formal@none@1@S@The [[phoneme|phonemic inventory]] is essentially [[Slavic languages|Slavic]], as is much of the semantics, while the [[vocabulary]] derives primarily from the [[Romance languages]], with a lesser contribution from the [[Germanic languages]].@@@@1@30@@danf@17-8-2009 10250470@unknown@formal@none@1@S@[[Pragmatics]] and other aspects of the language not specified by Zamenhof's original documents were influenced by the native languages of early speakers, primarily [[Russian language|Russian]], [[Polish language|Polish]], [[German language|German]], and [[French language|French]].@@@@1@32@@danf@17-8-2009 10250480@unknown@formal@none@1@S@[[Linguistic typology|Typologically]], Esperanto has [[preposition]]s and a [[information flow|pragmatic word order]] that by default is ''[[Subject Verb Object]]'' and ''[[Word order|Adjective Noun]]''.@@@@1@22@@danf@17-8-2009 10250490@unknown@formal@none@1@S@New words are formed through extensive [[prefix (linguistics)|prefix]]ing and [[suffix]]ing.@@@@1@10@@danf@17-8-2009 10250500@unknown@formal@none@1@S@=== Writing system ===@@@@1@4@@danf@17-8-2009 10250510@unknown@formal@none@1@S@Esperanto is written with a modified version of the [[Latin alphabet]], including six [[Letter (alphabet)|letters]] with [[diacritic]]s: [[c-circumflex|ĉ]], [[g-circumflex|ĝ]], [[h-circumflex|ĥ]], [[j-circumflex|ĵ]], [[s-circumflex|ŝ]] and [[u-breve|ŭ]] (that is, ''c, g, h, j, s'' [[circumflex]], and ''u'' [[breve]]).@@@@1@35@@danf@17-8-2009 10250520@unknown@formal@none@1@S@The alphabet does not include the letters ''q, w, x,'' or ''y'' except in unassimilated foreign names.@@@@1@17@@danf@17-8-2009 10250530@unknown@formal@none@1@S@The 28-letter alphabet is:
'''a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z'''
@@@@1@32@@danf@17-8-2009 10250540@unknown@formal@none@1@S@All letters are pronounced approximately as in the [[IPA]], with the exception of ''c'' and the accented letters:@@@@1@18@@danf@17-8-2009 10250550@unknown@formal@none@1@S@Two [[ASCII]]-compatible writing conventions are in use.@@@@1@7@@danf@17-8-2009 10250560@unknown@formal@none@1@S@These substitute [[Digraph (orthography)|digraph]]s for the accented letters.@@@@1@8@@danf@17-8-2009 10250570@unknown@formal@none@1@S@The original "h-convention" (''ch, gh, hh, jh, sh, u'') is based on English 'ch' and 'sh', while a more recent "[[x-convention]]" (''cx, gx, hx, jx, sx, ux'') is useful for alphabetic word sorting on a [[computer]] (''cx'' comes correctly after ''cu'', ''sx'' after ''sv'', etc.) as well as for simple conversion back into the standard [[orthography]].@@@@1@56@@danf@17-8-2009 10250580@unknown@formal@none@1@S@Another scheme represents the superscripted letters by a [[caret]] (^), as for example: c^ or ^c.@@@@1@16@@danf@17-8-2009 10250590@unknown@formal@none@1@S@=== Phonology ===@@@@1@3@@danf@17-8-2009 10250600@unknown@formal@none@1@S@:''(For help with the phonetic symbols, see [[Help:IPA]])''@@@@1@8@@danf@17-8-2009 10250610@unknown@formal@none@1@S@Esperanto has 22 [[consonant]]s, 5 [[vowel]]s, and two [[semivowel]]s, which combine with the vowels to form 6 [[diphthong]]s.@@@@1@18@@danf@17-8-2009 10250620@unknown@formal@none@1@S@(The consonant {{IPA|/j/}} and semivowel {{IPA|/i̯/}} are both written .)@@@@1@10@@danf@17-8-2009 10250625@unknown@formal@none@1@S@[[tone (linguistics)|Tone]] is not used to distinguish meanings of words.@@@@1@10@@danf@17-8-2009 10250630@unknown@formal@none@1@S@[[Stress (linguistics)|Stress]] is always on the penultimate vowel, unless a final vowel ''o'' is [[Elision|elided]], a practice which occurs mostly in [[poetry]].@@@@1@22@@danf@17-8-2009 10250640@unknown@formal@none@1@S@For example, ''familio'' "family" is stressed {{IPA2|fa.mi.ˈli.o}}, but when found without the final o, ''famili’,'' the stress does not shift: {{IPA|[fa.mi.ˈli]}}.@@@@1@21@@danf@17-8-2009 10250650@unknown@formal@none@1@S@==== Consonants ====@@@@1@3@@danf@17-8-2009 10250660@unknown@formal@none@1@S@The 22 consonants are:@@@@1@4@@danf@17-8-2009 10250670@unknown@formal@none@1@S@The sound {{IPA|/r/}} is usually [[alveolar trill|rolled]], but may be [[alveolar flap|tapped]] {{IPA|[ɾ]}}.@@@@1@13@@danf@17-8-2009 10250680@unknown@formal@none@1@S@The {{IPA|/v/}} has a normative pronunciation like an [[English language|English]] ''v,'' but is sometimes somewhere between a ''v'' and a ''w,'' {{IPA|[ʋ]}}, depending on the language background of the speaker.@@@@1@30@@danf@17-8-2009 10250690@unknown@formal@none@1@S@A semivowel {{IPA|/u̯/}} normally occurs only in [[diphthong]]s after the vowels {{IPA|/a/}} and {{IPA|/e/}}, not as a consonant {{IPA|*/w/}}.@@@@1@19@@danf@17-8-2009 10250700@unknown@formal@none@1@S@Common, if debated, [[assimilation (linguistics)|assimilation]] includes the pronunciation of {{IPA|/nk/}} as {{IPA|[ŋk]}}, as in English ''sink,'' and {{IPA|/kz/}} as {{IPA|[gz]}}, like the ''x'' in English ''example''.@@@@1@26@@danf@17-8-2009 10250710@unknown@formal@none@1@S@A large number of consonant clusters can occur, up to three in initial position and four in medial position, as in ''instrui'' "to teach".@@@@1@24@@danf@17-8-2009 10250720@unknown@formal@none@1@S@Final clusters are uncommon except in foreign names, poetic elision of final ''o,'' and a very few basic words such as ''cent'' "hundred" and ''post'' "after".@@@@1@26@@danf@17-8-2009 10250730@unknown@formal@none@1@S@====Vowels====@@@@1@1@@danf@17-8-2009 10250740@unknown@formal@none@1@S@Esperanto has the five [[cardinal vowels]] of [[Spanish language|Spanish]], [[Swahili language|Swahili]], and [[Modern Greek]].@@@@1@14@@danf@17-8-2009 10250750@unknown@formal@none@1@S@There are six falling diphthongs: ''uj, oj, ej, aj, aŭ, eŭ'' ({{IPA|/ui̯, oi̯, ei̯, ai̯, au̯, eu̯/}}).@@@@1@17@@danf@17-8-2009 10250760@unknown@formal@none@1@S@With only five vowels, a good deal of variation is tolerated.@@@@1@11@@danf@17-8-2009 10250770@unknown@formal@none@1@S@For instance, {{IPA|/e/}} commonly ranges from {{IPA|[e]}} (French ''é'') to {{IPA|[ɛ]}} (French ''è'').@@@@1@13@@danf@17-8-2009 10250780@unknown@formal@none@1@S@The details often depend on the speaker's native language.@@@@1@9@@danf@17-8-2009 10250790@unknown@formal@none@1@S@A [[glottal stop]] may occur between adjacent vowels in some people's speech, especially when the two vowels are the same, as in ''heroo'' "hero" ({{IPA|[he.ˈro.o]}} or {{IPA|[he.ˈro.ʔo]}}) and ''praavo'' "great-grandfather" ({{IPA|[pra.ˈa.vo]}} or {{IPA|[pra.ˈʔa.vo]}}).@@@@1@33@@danf@17-8-2009 10250800@unknown@formal@none@1@S@=== Grammar ===@@@@1@3@@danf@17-8-2009 10250810@unknown@formal@none@1@S@Esperanto words are [[Derivation (linguistics)|derived]] by stringing together [[prefix (linguistics)|prefix]]es, [[Root (linguistics)|roots]], and [[suffix]]es.@@@@1@14@@danf@17-8-2009 10250820@unknown@formal@none@1@S@This process is regular, so that people can create new words as they speak and be understood.@@@@1@17@@danf@17-8-2009 10250830@unknown@formal@none@1@S@[[Compound (linguistics)|Compound]] words are formed with a modifier-first, [[head (linguistics)|head-final]] order, the same order as English "birdsong" ''vs.'' "songbird".@@@@1@19@@danf@17-8-2009 10250840@unknown@formal@none@1@S@The different [[Part of speech|parts of speech]] are marked by their own suffixes: all [[common noun]]s end in ''-o,'' all [[adjective]]s in ''-a,'' all derived adverbs in ''-e,'' and all [[verb]]s in one of six [[Grammatical tense|tense]] and [[Grammatical mood|mood]] suffixes, such as [[present tense]] ''-as.''@@@@1@46@@danf@17-8-2009 10250850@unknown@formal@none@1@S@[[Grammatical number|Plural]] nouns end in ''-oj'' (pronounced "oy"), whereas [[direct object]]s end in ''-on.''@@@@1@14@@danf@17-8-2009 10250860@unknown@formal@none@1@S@Plural direct objects end with the combination ''-ojn'' (pronounced to rhyme with "coin"): That is, ''-o'' for a noun, plus ''-j'' for plural, plus ''-n'' for direct object.@@@@1@28@@danf@17-8-2009 10250870@unknown@formal@none@1@S@Adjectives [[Grammatical number#Effect of number on verbs and other parts of speech|agree]] with their nouns; their endings are plural ''-aj'' (pronounced "eye"), direct-object ''-an,'' and plural direct-object ''-ajn'' (pronounced to rhyme with "fine").@@@@1@33@@danf@17-8-2009 10250880@unknown@formal@none@1@S@The suffix ''-n'' is used to indicate the goal of movement and a few other things, in addition to the direct object.@@@@1@22@@danf@17-8-2009 10250890@unknown@formal@none@1@S@See [[Esperanto grammar]] for details.@@@@1@5@@danf@17-8-2009 10250900@unknown@formal@none@1@S@The six verb [[inflection]]s consist of three tenses and three moods.@@@@1@11@@danf@17-8-2009 10250910@unknown@formal@none@1@S@They are [[present tense]] ''-as,'' [[future tense]] ''-os,'' [[past tense]] ''-is,'' [[infinitive|infinitive mood]] ''-i,'' [[conditional mood]] ''-us,'' and [[jussive mood]] ''-u'' (used for wishes and commands).@@@@1@26@@danf@17-8-2009 10250920@unknown@formal@none@1@S@Verbs are not marked for person or number.@@@@1@8@@danf@17-8-2009 10250930@unknown@formal@none@1@S@For instance: ''kanti'' "to sing"; ''mi kantas'' "I sing"; ''mi kantis'' "I sang"; ''mi kantos'' "I will sing"; ''li kantas'' "he sings"; ''vi kantas'' "you sing".@@@@1@26@@danf@17-8-2009 10250940@unknown@formal@none@1@S@Word order is comparatively free: Adjectives may precede or follow nouns, and subjects, verbs and objects (marked by the suffix ''-n)'' may occur in any order.@@@@1@26@@danf@17-8-2009 10250950@unknown@formal@none@1@S@However, the [[article (grammar)|article]] ''la'' "the" and [[demonstrative]]s such as ''tiu'' "this, that" almost always come before the noun, and a [[preposition]] such as ''ĉe'' "at" ''must'' come before it.@@@@1@30@@danf@17-8-2009 10250960@unknown@formal@none@1@S@Similarly, the negative ''ne'' "not" and [[conjunction]]s such as ''kaj'' "both, and" and ''ke'' "that" must precede the [[phrase]] or [[clause]] they introduce.@@@@1@23@@danf@17-8-2009 10250970@unknown@formal@none@1@S@In [[copula]]r (A = B) clauses, word order is just as important as it is in English clauses like "people are dogs" ''vs.'' "dogs are people".@@@@1@26@@danf@17-8-2009 10250980@unknown@formal@none@1@S@====Correlatives====@@@@1@1@@danf@17-8-2009 10250990@unknown@formal@none@1@S@A [[correlative]] is a word used to ask or answer a question of who, where, what, when, or how.@@@@1@19@@danf@17-8-2009 10251000@unknown@formal@none@1@S@Correlatives in Esperanto are set out in a systematic manner that correlates a basic [[idea]] (quantity, manner, time, ''etc.'') to a function (questioning, indicating, negating, ''etc.'')@@@@1@26@@danf@17-8-2009 10251010@unknown@formal@none@1@S@Examples:@@@@1@1@@danf@17-8-2009 10251020@unknown@formal@none@1@S@*''Kio estas tio?''@@@@1@3@@danf@17-8-2009 10251030@unknown@formal@none@1@S@"What is this?"@@@@1@3@@danf@17-8-2009 10251040@unknown@formal@none@1@S@*''Kioma estas la horo?''@@@@1@4@@danf@17-8-2009 10251050@unknown@formal@none@1@S@"What time is it?"@@@@1@4@@danf@17-8-2009 10251060@unknown@formal@none@1@S@Note ''kioma'' rather than ''Kiu estas la horo?'' "which is the hour?", when asking for the ranking order of the hour on the clock.@@@@1@24@@danf@17-8-2009 10251070@unknown@formal@none@1@S@*''Io falis el la ŝranko'' "Something fell out of the cupboard."@@@@1@11@@danf@17-8-2009 10251080@unknown@formal@none@1@S@*''Homoj tiaj kiel mi ne konadas timon.''@@@@1@7@@danf@17-8-2009 10251090@unknown@formal@none@1@S@"Men such as me know no fear."@@@@1@7@@danf@17-8-2009 10251100@unknown@formal@none@1@S@Correlatives are declined if the case demands it:@@@@1@8@@danf@17-8-2009 10251110@unknown@formal@none@1@S@*''Vi devas elekti ian vorton pli simpla'' "You should choose a (some kind of) simpler word."@@@@1@16@@danf@17-8-2009 10251120@unknown@formal@none@1@S@''Ia'' receives ''-n'' because it's part of the [[direct object]].@@@@1@10@@danf@17-8-2009 10251130@unknown@formal@none@1@S@*''Kian libron vi volas?''@@@@1@4@@danf@17-8-2009 10251140@unknown@formal@none@1@S@"What sort of book do you want?"@@@@1@7@@danf@17-8-2009 10251150@unknown@formal@none@1@S@Contrast this with, ''Kiun libron vi volas?''@@@@1@7@@danf@17-8-2009 10251160@unknown@formal@none@1@S@"Which book do you want?"@@@@1@5@@danf@17-8-2009 10251170@unknown@formal@none@1@S@=== Vocabulary ===@@@@1@3@@danf@17-8-2009 10251180@unknown@formal@none@1@S@The core vocabulary of Esperanto was defined by ''Lingvo internacia'', published by Zamenhof in 1887.@@@@1@15@@danf@17-8-2009 10251190@unknown@formal@none@1@S@It comprised 900 roots, which could be expanded into tens of thousands of words with prefixes, suffixes, and compounding.@@@@1@19@@danf@17-8-2009 10251200@unknown@formal@none@1@S@In 1894, Zamenhof published the first Esperanto [[dictionary]], ''Universala Vortaro'', with a larger set of roots.@@@@1@16@@danf@17-8-2009 10251210@unknown@formal@none@1@S@However, the rules of the language allowed speakers to borrow new roots as needed, recommending only that they look for the most international forms, and then derive related meanings from these.@@@@1@31@@danf@17-8-2009 10251220@unknown@formal@none@1@S@Since then, many words have been borrowed, primarily but not solely from the Western European languages.@@@@1@16@@danf@17-8-2009 10251230@unknown@formal@none@1@S@Not all proposed borrowings catch on, but many do, especially [[technical terminology|technical]] and [[science|scientific]] terms.@@@@1@15@@danf@17-8-2009 10251240@unknown@formal@none@1@S@Terms for everyday use, on the other hand, are more likely to be derived from existing roots—for example ''komputilo'' (a computer) from ''komputi'' (to compute) plus the suffix ''-ilo'' (tool)—or to be covered by extending the meanings of existing words (for example ''muso'' (a mouse), as in English, now also means a computer input device).@@@@1@55@@danf@17-8-2009 10251250@unknown@formal@none@1@S@There are frequent debates among Esperanto speakers about whether a particular borrowing is justified or whether the need can be met by deriving from or extending the meaning of existing words.@@@@1@31@@danf@17-8-2009 10251260@unknown@formal@none@1@S@In addition to the root words and the rules for combining them, a learner of Esperanto must memorize some idiomatic compounds that are not entirely straightforward.@@@@1@26@@danf@17-8-2009 10251270@unknown@formal@none@1@S@For example, ''eldoni'', literally "to give out", is used for "to publish" (a [[calque]] of words in several European languages with the same derivation), and ''vortaro'', literally "a collection of words", means "a glossary" or "a dictionary".@@@@1@37@@danf@17-8-2009 10251280@unknown@formal@none@1@S@Such forms are modeled after usage in some European languages, and speakers of other languages may find them illogical.@@@@1@19@@danf@17-8-2009 10251290@unknown@formal@none@1@S@Fossilized derivations inherited from Esperanto's source languages may be similarly obscure, such as the opaque connection the root word ''centralo'' "power station" has with ''centro'' "center".@@@@1@26@@danf@17-8-2009 10251300@unknown@formal@none@1@S@Compounds with ''-um-'' are overtly arbitrary, and must be learned individually, as ''-um-'' has no defined meaning.@@@@1@17@@danf@17-8-2009 10251310@unknown@formal@none@1@S@It turns ''dekstren'' "to the right" into ''dekstrumen'' "clockwise", and ''komuna'' "common/shared" into ''komunumo'' "community", for example.@@@@1@17@@danf@17-8-2009 10251320@unknown@formal@none@1@S@Nevertheless, there are not nearly as many idiomatic or [[slang]] words in Esperanto as in ethnic languages, as these tend to make international communication difficult, working against Esperanto's main goal.@@@@1@30@@danf@17-8-2009 10251330@unknown@formal@none@1@S@===Useful phrases===@@@@1@2@@danf@17-8-2009 10251340@unknown@formal@none@1@S@Here are some useful Esperanto phrases, with [[help:IPA|IPA]] transcriptions:@@@@1@9@@danf@17-8-2009 10251350@unknown@formal@none@1@S@* Hello: ''Saluton'' {{IPA|/sa.ˈlu.ton/}}@@@@1@4@@danf@17-8-2009 10251360@unknown@formal@none@1@S@* What is your name?: ''Kiel vi nomiĝas?''@@@@1@8@@danf@17-8-2009 10251370@unknown@formal@none@1@S@{{IPA|/ˈki.el vi no.ˈmi.ʤas/}}@@@@1@3@@danf@17-8-2009 10251380@unknown@formal@none@1@S@* My name is...: ''Mi nomiĝas...''@@@@1@6@@danf@17-8-2009 10251390@unknown@formal@none@1@S@{{IPA|/mi no.ˈmi.ʤas/}}@@@@1@2@@danf@17-8-2009 10251400@unknown@formal@none@1@S@* How much (is it/are they)?: ''Kiom (estas)?''@@@@1@8@@danf@17-8-2009 10251410@unknown@formal@none@1@S@{{IPA|/ˈki.om ˈes.tas/}}@@@@1@2@@danf@17-8-2009 10251420@unknown@formal@none@1@S@* Here you are: ''Jen'' {{IPA|/jen/}}@@@@1@6@@danf@17-8-2009 10251430@unknown@formal@none@1@S@* Do you speak Esperanto?: ''Ĉu vi parolas Esperanton?''@@@@1@9@@danf@17-8-2009 10251440@unknown@formal@none@1@S@{{IPA|/ˈʧu vi pa.ˈro.las es.pe.ˈran.ton/}}@@@@1@4@@danf@17-8-2009 10251450@unknown@formal@none@1@S@* I do not understand you: ''Mi ne komprenas vin'' {{IPA|/mi ˈne kom.ˈpre.nas vin/}}@@@@1@14@@danf@17-8-2009 10251460@unknown@formal@none@1@S@* I like ''this'' one: ''Ĉi tiu plaĉas al mi'' {{IPA|/ʧi ˈti.u ˈpla.ʧas al ˈmi/}} or ''Mi ŝatas tiun ĉi'' {{IPA|/mi ˈʃa.tas ˈti.un ˈʧi/}}@@@@1@24@@danf@17-8-2009 10251470@unknown@formal@none@1@S@* Thank you: ''Dankon'' {{IPA|/ˈdan.kon/}}@@@@1@5@@danf@17-8-2009 10251480@unknown@formal@none@1@S@* You're welcome: ''Ne dankinde'' {{IPA|/ˈne dan.ˈkin.de/}}@@@@1@7@@danf@17-8-2009 10251490@unknown@formal@none@1@S@* Please: ''Bonvolu'' {{IPA|/bon.ˈvo.lu/}} or ''mi petas'' {{IPA|/mi ˈpe.tas/}}@@@@1@9@@danf@17-8-2009 10251500@unknown@formal@none@1@S@* Here's to your health: ''Je via sano'' {{IPA|/je ˈvi.a ˈsa.no/}}@@@@1@11@@danf@17-8-2009 10251510@unknown@formal@none@1@S@* Bless you!/Gesundheit!: ''Sanon!''@@@@1@4@@danf@17-8-2009 10251520@unknown@formal@none@1@S@{{IPA|/ˈsa.non/}}@@@@1@1@@danf@17-8-2009 10251530@unknown@formal@none@1@S@* Congratulations!: ''Gratulon!''@@@@1@3@@danf@17-8-2009 10251540@unknown@formal@none@1@S@{{IPA|/ɡra.ˈtu.lon/}}@@@@1@1@@danf@17-8-2009 10251550@unknown@formal@none@1@S@* Okay: ''Bone'' {{IPA|/ˈbo.ne/}} or ''Ĝuste'' {{IPA|/ˈʤus.te/}}@@@@1@7@@danf@17-8-2009 10251560@unknown@formal@none@1@S@* Yes: ''Jes'' {{IPA|/ˈjes/}}@@@@1@4@@danf@17-8-2009 10251570@unknown@formal@none@1@S@* No: ''Ne'' {{IPA|/ˈne/}}@@@@1@4@@danf@17-8-2009 10251580@unknown@formal@none@1@S@* It is a nice day: ''Estas bela tago'' {{IPA|/ˈes.tas ˈbe.la ˈta.ɡo/}}@@@@1@12@@danf@17-8-2009 10251590@unknown@formal@none@1@S@* I love you: ''Mi amas vin'' {{IPA|/mi ˈa.mas vin/}}@@@@1@10@@danf@17-8-2009 10251600@unknown@formal@none@1@S@* Goodbye: ''Ĝis (la) (revido)'' {{IPA|/ʤis la re.ˈvi.do/}}@@@@1@8@@danf@17-8-2009 10251610@unknown@formal@none@1@S@* One beer, please: ''Unu bieron, mi petas.''@@@@1@8@@danf@17-8-2009 10251620@unknown@formal@none@1@S@{{IPA|/ˈu.nu bi.ˈe.ron, mi ˈpe.tas/}}@@@@1@4@@danf@17-8-2009 10251630@unknown@formal@none@1@S@* What is that?: ''Kio estas tio?''@@@@1@7@@danf@17-8-2009 10251640@unknown@formal@none@1@S@{{IPA|/ˈki.o ˈes.tas ˈti.o/}}@@@@1@3@@danf@17-8-2009 10251650@unknown@formal@none@1@S@* That is...: ''Tio estas...''@@@@1@5@@danf@17-8-2009 10251660@unknown@formal@none@1@S@{{IPA|/ˈti.o ˈes.tas/}}@@@@1@2@@danf@17-8-2009 10251670@unknown@formal@none@1@S@* How are you?: ''Kiel vi (fartas)?''@@@@1@7@@danf@17-8-2009 10251680@unknown@formal@none@1@S@{{IPA|/ˈki.el vi ˈfar.tas/}}@@@@1@3@@danf@17-8-2009 10251690@unknown@formal@none@1@S@* Good morning!: ''Bonan matenon!''@@@@1@5@@danf@17-8-2009 10251700@unknown@formal@none@1@S@{{IPA|/ˈbo.nan ma.ˈte.non/}}@@@@1@2@@danf@17-8-2009 10251710@unknown@formal@none@1@S@* Good evening!: ''Bonan vesperon!''@@@@1@5@@danf@17-8-2009 10251720@unknown@formal@none@1@S@{{IPA|/ˈbo.nan ves.ˈpe.ron/}}@@@@1@2@@danf@17-8-2009 10251730@unknown@formal@none@1@S@* Good night!: ''Bonan nokton!''@@@@1@5@@danf@17-8-2009 10251740@unknown@formal@none@1@S@{{IPA|/ˈbo.nan ˈnok.ton/}}@@@@1@2@@danf@17-8-2009 10251750@unknown@formal@none@1@S@* Peace!: ''Pacon!''@@@@1@3@@danf@17-8-2009 10251760@unknown@formal@none@1@S@{{IPA|/ˈpa.tson/}}@@@@1@1@@danf@17-8-2009 10251770@unknown@formal@none@1@S@=== Sample text ===@@@@1@4@@danf@17-8-2009 10251780@unknown@formal@none@1@S@The following short extract gives an idea of the character of Esperanto.@@@@1@12@@danf@17-8-2009 10251790@unknown@formal@none@1@S@(Pronunciation is covered above.@@@@1@4@@danf@17-8-2009 10251800@unknown@formal@none@1@S@The main point for English speakers to remember is that the letter 'J' has the sound of the letter 'Y' in English)@@@@1@22@@danf@17-8-2009 10251810@unknown@formal@none@1@S@* Esperanto text@@@@1@3@@danf@17-8-2009 10251820@unknown@formal@none@1@S@:''En multaj lokoj de Ĉinio estis temploj de drako-reĝo. Dum trosekeco oni preĝis en la temploj, ke la drako-reĝo donu pluvon al la homa mondo.@@@@1@25@@danf@17-8-2009 10251830@unknown@formal@none@1@S@Tiam drako estis simbolo de la supernatura estaĵo. Kaj pli poste, ĝi fariĝis prapatro de la plej altaj regantoj kaj simbolis la absolutan aŭtoritaton de feŭda imperiestro.@@@@1@27@@danf@17-8-2009 10251840@unknown@formal@none@1@S@La imperiestro pretendis, ke li estas filo de la drako. Ĉiuj liaj vivbezonaĵoj portis la nomon drako kaj estis ornamitaj per diversaj drakofiguroj.@@@@1@23@@danf@17-8-2009 10251850@unknown@formal@none@1@S@Nun ĉie en Ĉinio videblas drako-ornamentaĵoj kaj cirkulas legendoj pri drakoj.''@@@@1@11@@danf@17-8-2009 10251860@unknown@formal@none@1@S@*English Translation:@@@@1@2@@danf@17-8-2009 10251870@unknown@formal@none@1@S@:In many places in China there were temples of the dragon king.@@@@1@12@@danf@17-8-2009 10251880@unknown@formal@none@1@S@During times of drought, people prayed in the temples, that the dragon king would give rain to the human world.@@@@1@20@@danf@17-8-2009 10251890@unknown@formal@none@1@S@At that time the dragon was a symbol of the supernatural.@@@@1@11@@danf@17-8-2009 10251900@unknown@formal@none@1@S@Later on, it became the ancestor of the highest rulers and symbolised the absolute authority of the feudal emperor.@@@@1@19@@danf@17-8-2009 10251910@unknown@formal@none@1@S@The emperor claimed to be the son of the dragon.@@@@1@10@@danf@17-8-2009 10251920@unknown@formal@none@1@S@All of his personal possessions carried the name ''dragon'' and were decorated with various dragon figures.@@@@1@16@@danf@17-8-2009 10251930@unknown@formal@none@1@S@Now everywhere in China dragon decorations can be seen and there circulate legends about dragons.@@@@1@15@@danf@17-8-2009 10251940@unknown@formal@none@1@S@== Education ==@@@@1@3@@danf@17-8-2009 10251950@unknown@formal@none@1@S@The majority of Esperanto speakers learn the language through self-directed study, online tutorials, and correspondence courses taught by volunteers.@@@@1@19@@danf@17-8-2009 10251960@unknown@formal@none@1@S@In more recent years, teaching websites like ''[[lernu!]]'' have become popular.@@@@1@11@@danf@17-8-2009 10251970@unknown@formal@none@1@S@Esperanto instruction is occasionally available at schools, such as a [[Esperanto#Esperanto and language acquisition|pilot project involving four primary schools]] under the supervision of the [[University of Manchester]], and by one count at 69 universities.@@@@1@34@@danf@17-8-2009 10251980@unknown@formal@none@1@S@However, outside of [[China]] and [[Hungary]], these mostly involve informal arrangements rather than dedicated departments or state sponsorship.@@@@1@18@@danf@17-8-2009 10251990@unknown@formal@none@1@S@[[Eötvös Loránd University]] in Budapest had a department of Interlinguistics and Esperanto from 1966 to 2004, after which time instruction moved to vocational colleges; there are state examinations for Esperanto instructors.@@@@1@31@@danf@17-8-2009 10252000@unknown@formal@none@1@S@Various educators have estimated that Esperanto can be learned in anywhere from one quarter to one twentieth the amount of time required for other languages.@@@@1@25@@danf@17-8-2009 10252010@unknown@formal@none@1@S@Some argue, however, that this is only true for native speakers of Western European languages.@@@@1@15@@danf@17-8-2009 10252020@unknown@formal@none@1@S@[[Claude Piron]], a psychologist formerly at the [[University of Geneva]] and Chinese-English-Russian-Spanish translator for the United Nations, argued that Esperanto is far more "brain friendly" than many ethnic languages.@@@@1@29@@danf@17-8-2009 10252030@unknown@formal@none@1@S@"Esperanto relies entirely on innate reflexes [and] differs from all other languages in that you can always trust your natural tendency to generalize patterns. [...]@@@@1@25@@danf@17-8-2009 10252040@unknown@formal@none@1@S@The same [[neuropsychology|neuropsychological]] law [— called by] [[Jean Piaget]] ''generalizing assimilation'' — applies to word formation as well as to grammar."@@@@1@21@@danf@17-8-2009 10252050@unknown@formal@none@1@S@=== Language acquisition ===@@@@1@4@@danf@17-8-2009 10252060@unknown@formal@none@1@S@Four primary schools in Britain, with some 230 pupils, are currently following a course in "propedeutic Esperanto", under the supervision of the University of Manchester.@@@@1@25@@danf@17-8-2009 10252070@unknown@formal@none@1@S@That is, instruction in Esperanto to raise language awareness and accelerate subsequent learning of foreign languages.@@@@1@16@@danf@17-8-2009 10252080@unknown@formal@none@1@S@Several studies demonstrate that studying Esperanto before another foreign language speeds and improves learning the second language to a greater extent than other languages which have been investigated.@@@@1@28@@danf@17-8-2009 10252090@unknown@formal@none@1@S@This appears to be because learning subsequent foreign languages is easier than learning one's first, while the use of a grammatically simple and culturally flexible auxiliary language like Esperanto lessens the first-language learning hurdle.@@@@1@34@@danf@17-8-2009 10252100@unknown@formal@none@1@S@In one study, a group of European [[secondary school]] students studied Esperanto for one year, then French for three years, and ended up with a significantly better command of French than a control group, who studied French for all four years.@@@@1@41@@danf@17-8-2009 10252110@unknown@formal@none@1@S@Similar results were found when the course of study was reduced to two years, of which six months was spent learning Esperanto.@@@@1@22@@danf@17-8-2009 10252120@unknown@formal@none@1@S@Results are not yet available from a study in Australia to see if similar benefits would occur for learning East Asian languages, but the pupils taking Esperanto did better and enjoyed the subject more than those taking other languages.@@@@1@39@@danf@17-8-2009 10252130@unknown@formal@none@1@S@== Community ==@@@@1@3@@danf@17-8-2009 10252140@unknown@formal@none@1@S@=== Geography and demography ===@@@@1@5@@danf@17-8-2009 10252150@unknown@formal@none@1@S@Esperanto speakers are more numerous in Europe and East [[Asia]] than in the Americas, [[Africa]], and [[Oceania]], and more numerous in [[urban area|urban]] than in [[rural]] areas.@@@@1@27@@danf@17-8-2009 10252160@unknown@formal@none@1@S@Esperanto is particularly prevalent in the northern and eastern countries of Europe; in China, [[Korea]], Japan, and [[Iran]] within Asia; in [[Brazil]], [[Argentina]], and [[Mexico]] in the Americas; and in [[Togo]] in Africa.@@@@1@33@@danf@17-8-2009 10252170@unknown@formal@none@1@S@====Number of speakers====@@@@1@3@@danf@17-8-2009 10252180@unknown@formal@none@1@S@An estimate of the number of Esperanto speakers was made by the late [[Sidney S. Culbert]], a [[retirement|retired]] [[psychology]] [[professor]] at the [[University of Washington]] and a longtime Esperantist, who tracked down and tested Esperanto speakers in sample areas in dozens of countries over a period of twenty years.@@@@1@49@@danf@17-8-2009 10252190@unknown@formal@none@1@S@Culbert concluded that between one and two million people speak Esperanto at [[ILR or Foreign Service Level language ability measures|Foreign Service Level 3]], "professionally proficient" (able to communicate moderately complex ideas without hesitation, and to follow speeches, radio broadcasts, etc.).@@@@1@40@@danf@17-8-2009 10252200@unknown@formal@none@1@S@Culbert's estimate was not made for Esperanto alone, but formed part of his listing of estimates for all languages of over 1 million speakers, published annually in the [[World Almanac|World Almanac and Book of Facts]].@@@@1@35@@danf@17-8-2009 10252210@unknown@formal@none@1@S@Culbert's most detailed account of his methodology is found in a 1989 letter to David Wolff .@@@@1@17@@danf@17-8-2009 10252220@unknown@formal@none@1@S@Since Culbert never published detailed intermediate results for particular countries and regions, it is difficult to independently gauge the accuracy of his results.@@@@1@23@@danf@17-8-2009 10252230@unknown@formal@none@1@S@In the Almanac, his estimates for numbers of language speakers were rounded to the nearest million, thus the number for Esperanto speakers is shown as 2 million.@@@@1@27@@danf@17-8-2009 10252240@unknown@formal@none@1@S@This latter figure appears in ''[[Ethnologue]]''.@@@@1@6@@danf@17-8-2009 10252250@unknown@formal@none@1@S@Assuming that this figure is accurate, that means that about 0.03% of the world's population speaks the language.@@@@1@18@@danf@17-8-2009 10252260@unknown@formal@none@1@S@This falls short of Zamenhof's goal of a [[international auxiliary language|universal language]], but it represents a level of popularity unmatched by any other constructed language.@@@@1@25@@danf@17-8-2009 10252270@unknown@formal@none@1@S@Marcus Sikosek (now [[Ziko van Dijk]]) has challenged this figure of 1.6 million as exaggerated.@@@@1@15@@danf@17-8-2009 10252280@unknown@formal@none@1@S@He estimated that even if Esperanto speakers were evenly distributed, assuming one million Esperanto speakers worldwide would lead one to expect about 180 in the city of [[Cologne, Germany|Cologne]].@@@@1@29@@danf@17-8-2009 10252290@unknown@formal@none@1@S@Van Dijk finds only 30 [[fluency|fluent]] speakers in that city, and similarly smaller than expected figures in several other places thought to have a larger-than-average concentration of Esperanto speakers.@@@@1@29@@danf@17-8-2009 10252300@unknown@formal@none@1@S@He also notes that there are a total of about 20,000 members of the various Esperanto organizations (other estimates are higher).@@@@1@21@@danf@17-8-2009 10252310@unknown@formal@none@1@S@Though there are undoubtedly many Esperanto speakers who are not members of any Esperanto organization, he thinks it unlikely that there are fifty times more speakers than organization members.@@@@1@29@@danf@17-8-2009 10252320@unknown@formal@none@1@S@[[Finnish people|Finnish]] [[linguistics|linguist]] Jouko Lindstedt, an expert on native-born Esperanto speakers, presented the following scheme to show the overall proportions of language capabilities within the Esperanto community:@@@@1@27@@danf@17-8-2009 10252330@unknown@formal@none@1@S@* ''1,000 have Esperanto as their native language@@@@1@8@@danf@17-8-2009 10252340@unknown@formal@none@1@S@* ''10,000 speak it fluently@@@@1@5@@danf@17-8-2009 10252350@unknown@formal@none@1@S@* ''100,000 can use it actively@@@@1@6@@danf@17-8-2009 10252360@unknown@formal@none@1@S@* ''1,000,000 understand a large amount passively@@@@1@7@@danf@17-8-2009 10252370@unknown@formal@none@1@S@* ''10,000,000 have studied it to some extent at some time.''@@@@1@11@@danf@17-8-2009 10252380@unknown@formal@none@1@S@In the absence of Dr. Culbert's detailed sampling data, or any other census data, it is impossible to state the number of speakers with certainty.@@@@1@25@@danf@17-8-2009 10252390@unknown@formal@none@1@S@Few observers, probably, would challenge the following statement from the [[website]] of the [[World Esperanto Association]]:@@@@1@16@@danf@17-8-2009 10252400@unknown@formal@none@1@S@:Numbers of [[textbook]]s sold and membership of local societies put the number of people with some knowledge of the language in the hundreds of thousands and possibly millions.@@@@1@28@@danf@17-8-2009 10252410@unknown@formal@none@1@S@====Native speakers====@@@@1@2@@danf@17-8-2009 10252420@unknown@formal@none@1@S@Ethnologue reports estimates that there are 200 to 2000 native Esperanto speakers ''(denaskuloj),'' who have learned the language from birth from their Esperanto-speaking parents.@@@@1@24@@danf@17-8-2009 10252430@unknown@formal@none@1@S@This usually happens when Esperanto is the chief or only common language in an international family, but sometimes in a family of devoted Esperantists.@@@@1@24@@danf@17-8-2009 10252440@unknown@formal@none@1@S@The most famous native speaker of Esperanto is businessman [[George Soros]].@@@@1@11@@danf@17-8-2009 10252450@unknown@formal@none@1@S@Also notable is young Holocaust victim [[Petr Ginz]], whose drawing of the planet Earth as viewed from the moon was carried aboard the Space Shuttle ''[[Space Shuttle Columbia|Columbia]]'' in 2003 ([[STS-107]]).@@@@1@31@@danf@17-8-2009 10252460@unknown@formal@none@1@S@=== Culture ===@@@@1@3@@danf@17-8-2009 10252470@unknown@formal@none@1@S@Esperanto speakers can access an international [[culture]], including a large body of original as well as translated [[Esperanto literature|literature]].@@@@1@19@@danf@17-8-2009 10252480@unknown@formal@none@1@S@There are over 25,000 Esperanto books, both originals and translations, as well as several regularly distributed [[List of Esperanto magazines|Esperanto magazines]].@@@@1@21@@danf@17-8-2009 10252490@unknown@formal@none@1@S@Esperanto speakers use the language for free accommodations with [[Esperantist]]s in 92 countries using the [[Pasporta Servo]] or to develop [[pen pal]] friendships abroad through the Esperanto Pen Pal Service.@@@@1@30@@danf@17-8-2009 10252500@unknown@formal@none@1@S@Every year, 1,500-3,000 Esperanto speakers meet for the [[World Congress of Esperanto]] ''(Universala Kongreso de Esperanto)''.@@@@1@16@@danf@17-8-2009 10252510@unknown@formal@none@1@S@The [[European Esperanto Union]] ''(Eǔropa Esperanto-Unio)'' regroups the national Esperanto associations of the EU member states and holds congresses every two years.@@@@1@22@@danf@17-8-2009 10252520@unknown@formal@none@1@S@The most recent was in [[Maribor, Slovenia]], in July-August 2007.@@@@1@10@@danf@17-8-2009 10252530@unknown@formal@none@1@S@It attracted 256 delegates from 28 countries, including 2 members of the [[European Parliament]], Ms. [[Małgorzata Handzlik]] of [[Poland]] and Ms. [[Ljudmila Novak]] of [[Slovenia]].@@@@1@25@@danf@17-8-2009 10252540@unknown@formal@none@1@S@Historically, much [[Esperanto music]] has been in various folk traditions, such as ''Kaj Tiel Plu'', for example.@@@@1@17@@danf@17-8-2009 10252550@unknown@formal@none@1@S@In recent decades, more rock and other modern genres have appeared, an example being the Swedish band ''Persone''.@@@@1@18@@danf@17-8-2009 10252560@unknown@formal@none@1@S@There are also shared [[tradition]]s, such as [[Zamenhof Day]], and shared [[behaviour]] patterns.@@@@1@13@@danf@17-8-2009 10252570@unknown@formal@none@1@S@[[Esperantist]]s speak primarily in Esperanto at [[World Esperanto Congress|international Esperanto meetings]].@@@@1@11@@danf@17-8-2009 10252580@unknown@formal@none@1@S@Detractors of Esperanto occasionally criticize it as "having no culture".@@@@1@10@@danf@17-8-2009 10252590@unknown@formal@none@1@S@Proponents, such as Prof. [[Humphrey Tonkin]] of the [[University of Hartford]], observe that Esperanto is "culturally neutral by design, as it was intended to be a facilitator between cultures, not to be the carrier of any one national culture."@@@@1@39@@danf@17-8-2009 10252610@unknown@formal@none@1@S@The late [[Scotland|Scottish]] Esperanto author [[William Auld]] has written extensively on the subject, arguing that Esperanto is "the expression of a [[Esperanto as an international language|common human culture]], unencumbered by national frontiers.@@@@1@32@@danf@17-8-2009 10252620@unknown@formal@none@1@S@Thus it is considered a culture on its own."@@@@1@9@@danf@17-8-2009 10252630@unknown@formal@none@1@S@Others point to Esperanto's potential for strengthening a common European identity, as it combines features of several [[Esperanto etymology|European languages]].@@@@1@20@@danf@17-8-2009 10252640@unknown@formal@none@1@S@====In popular culture====@@@@1@3@@danf@17-8-2009 10252650@unknown@formal@none@1@S@Esperanto has been used in a number of films and novels.@@@@1@11@@danf@17-8-2009 10252660@unknown@formal@none@1@S@Typically, this is done either to add the exotic flavour of a foreign language without representing any particular ethnicity, or to avoid going to the trouble of inventing a new language.@@@@1@31@@danf@17-8-2009 10252670@unknown@formal@none@1@S@The [[Charlie Chaplin]] film ''[[The Great Dictator]]'' (1940) showed [[Warsaw ghetto|Jewish ghetto]] shops designated in Esperanto, each with the general Esperanto suffix ''-ejo'' (meaning "place for..."), in order to convey the atmosphere of some 'foreign' [[Eastern Europe|East European]] country without referencing any particular East European language.@@@@1@46@@danf@17-8-2009 10252680@unknown@formal@none@1@S@Two full-length [[feature film]]s have been produced with [[dialogue]] entirely in Esperanto: ''[[Angoroj]],'' in 1964, and ''[[Incubus (1965 film)|Incubus]],'' a 1965 [[B-movie]] horror film.@@@@1@24@@danf@17-8-2009 10252690@unknown@formal@none@1@S@[[Canada|Canadian]] actor [[William Shatner]] learned Esperanto to a limited level so that he could star in ''Incubus''.@@@@1@17@@danf@17-8-2009 10252700@unknown@formal@none@1@S@Other amateur productions have been made, such as a dramatisation of the novel ''Gerda Malaperis'' (Gerda Has Disappeared).@@@@1@18@@danf@17-8-2009 10252710@unknown@formal@none@1@S@A number of "mainstream" films in national languages have used Esperanto in some way, such as ''[[Gattaca]]'' (1997), in which Esperanto can be overheard on the public address system.@@@@1@29@@danf@17-8-2009 10252720@unknown@formal@none@1@S@In the 1994 film ''[[Street Fighter]]'', Esperanto is the native language of the fictional country of [[Shadaloo]], and in a barracks scene the soldiers of villain [[M. Bison]] sing a rousing Russian Army-style chorus, the "Bison Troopers Marching Song", in the language.@@@@1@42@@danf@17-8-2009 10252730@unknown@formal@none@1@S@Esperanto is also spoken and appears on signs in the film ''[[Blade: Trinity]]''.@@@@1@13@@danf@17-8-2009 10252740@unknown@formal@none@1@S@In the British comedy ''[[Red Dwarf]]'', [[Arnold Rimmer]] is seen attempting to learn Esperanto in a number of early episodes, including ''[[Kryten (Red Dwarf episode)|Kryten]]''.@@@@1@25@@danf@17-8-2009 10252750@unknown@formal@none@1@S@In the first season, signs on the titular spacecraft are in both English and Esperanto.@@@@1@15@@danf@17-8-2009 10252760@unknown@formal@none@1@S@Esperanto is used as the universal language in the far future of [[Harry Harrison]]'s ''[[Stainless Steel Rat]]'' and ''[[Deathworld]]'' stories.@@@@1@20@@danf@17-8-2009 10252770@unknown@formal@none@1@S@In a 1969 guest appearance on ''[[The Tonight Show]]'', [[Jay Silverheels]] of ''[[The Lone Ranger]]'' fame appeared in character as [[Tonto]] for a comedy sketch with [[Johnny Carson]], and claimed Esperanto skills as he sought new employment.@@@@1@37@@danf@17-8-2009 10252780@unknown@formal@none@1@S@The sketch ended with a statement of his ideal situation: "Tonto, to [[Toronto, Canada|Toronto]], for Esperanto, and pronto!"@@@@1@18@@danf@17-8-2009 10252790@unknown@formal@none@1@S@Also, in the [[Danny Phantom]] Episode, "Public Enemies", Danny, Tucker, and Sam come across a ghost wolf who speaks Esperanto, but only Tucker can understand at first.@@@@1@27@@danf@17-8-2009 10252800@unknown@formal@none@1@S@=== In Science ===@@@@1@4@@danf@17-8-2009 10252810@unknown@formal@none@1@S@In 1921 the [[French Academy of Sciences]] recommended using Esperanto for international scientific communication.@@@@1@14@@danf@17-8-2009 10252820@unknown@formal@none@1@S@A few scientists and mathematicians, such as [[Maurice René Fréchet|Maurice Fréchet]] (mathematics), [[John C. Wells]] (linguistics), [[Helmar Frank]] (pedagogy and cybernetics), and [[Nobel Prize in Economics|Nobel laureate]] [[Reinhard Selten]] (economics) have published part of their work in Esperanto.@@@@1@38@@danf@17-8-2009 10252830@unknown@formal@none@1@S@Frank and Selten were among the founders of the [[Akademio Internacia de la Sciencoj San Marino|International Academy of Sciences]] in [[San Marino]], sometimes called the "Esperanto University", where Esperanto is the primary language of teaching and administration.@@@@1@37@@danf@17-8-2009 10252840@unknown@formal@none@1@S@=== Goals of the movement ===@@@@1@6@@danf@17-8-2009 10252850@unknown@formal@none@1@S@Zamenhof's intention was to create an easy-to-learn language to foster international understanding.@@@@1@12@@danf@17-8-2009 10252860@unknown@formal@none@1@S@It was to serve as an international auxiliary language, that is, as a universal second language, not to replace ethnic languages.@@@@1@21@@danf@17-8-2009 10252870@unknown@formal@none@1@S@This goal was widely shared among Esperanto speakers in the early decades of the movement.@@@@1@15@@danf@17-8-2009 10252880@unknown@formal@none@1@S@Later, Esperanto speakers began to see the language and the culture that had grown up around it as ends in themselves, even if Esperanto is never adopted by the United Nations or other international organizations.@@@@1@35@@danf@17-8-2009 10252890@unknown@formal@none@1@S@Those Esperanto speakers who want to see Esperanto adopted officially or on a large scale worldwide are commonly called ''[[Finvenkismo|finvenkistoj]]'', from ''fina venko'', meaning "final victory", or ''pracelistoj'', from ''pracelo'', meaning "original goal".@@@@1@33@@danf@17-8-2009 10252900@unknown@formal@none@1@S@Those who focus on the intrinsic value of the language are commonly called ''[[Raumism|raŭmistoj]]'', from [[Rauma, Finland|Rauma]], [[Finland]], where a declaration on the near-term unlikelihood of the "fina venko" and the value of Esperanto culture was made at the International Youth Congress in 1980.@@@@1@44@@danf@17-8-2009 10252910@unknown@formal@none@1@S@These categories are, however, not mutually exclusive.@@@@1@7@@danf@17-8-2009 10252920@unknown@formal@none@1@S@The [[Prague Manifesto (Esperanto)|Prague Manifesto]] (1996) presents the views of the mainstream of the Esperanto movement and of its main organisation, the World Esperanto Association ([[World Esperanto Association|UEA]]).@@@@1@28@@danf@17-8-2009 10252930@unknown@formal@none@1@S@=== Symbols and flags ===@@@@1@5@@danf@17-8-2009 10252940@unknown@formal@none@1@S@In 1893, C. Rjabinis and P. Deullin designed and manufactured a lapel pin for Esperantists to identify each other.@@@@1@19@@danf@17-8-2009 10252950@unknown@formal@none@1@S@The design was a circular pin with a white background and a five pointed green star.@@@@1@16@@danf@17-8-2009 10252960@unknown@formal@none@1@S@The theme of the design was the hope of the [[Continent#Number of continents|five continents]] being united by a common language.@@@@1@20@@danf@17-8-2009 10252970@unknown@formal@none@1@S@The earliest flag, and the one most commonly used today, features a green five-pointed star against a white canton, upon a field of green.@@@@1@24@@danf@17-8-2009 10252980@unknown@formal@none@1@S@It was proposed to Zamenhof by [[Ireland|Irishman]] Richard Geoghegan, author of the first Esperanto textbook for English speakers, in 1887.@@@@1@20@@danf@17-8-2009 10252990@unknown@formal@none@1@S@In 1905, delegates to the first conference of Esperantists at Boulogne-sur-Mer unanimously approved a version that differed from the modern flag only by the superimposition of an "E" over the green star.@@@@1@32@@danf@17-8-2009 10253000@unknown@formal@none@1@S@Other variants include that for Christian Esperantists, with a white [[Christian cross]] superimposed upon the green star, and that for Leftists, with [[Red flag|the color of the field changed from green to red]].@@@@1@33@@danf@17-8-2009 10253010@unknown@formal@none@1@S@In 1987, a second flag design was chosen in a contest organized by the UEA celebrating the first centennial of the language.@@@@1@22@@danf@17-8-2009 10253020@unknown@formal@none@1@S@It featured a white background with two stylised curved "E"s facing each other.@@@@1@13@@danf@17-8-2009 10253030@unknown@formal@none@1@S@Dubbed the "jubilea simbolo" ([[Esperanto jubilee symbol|jubilee symbol]]) , it attracted criticism from some Esperantists, who dubbed it the "melono" (melon) because of the design's elliptical shape.@@@@1@27@@danf@17-8-2009 10253040@unknown@formal@none@1@S@It is still in use, though to a lesser degree than the traditional symbol, known as the "verda stelo" (green star).@@@@1@21@@danf@17-8-2009 10253050@unknown@formal@none@1@S@=== Religion ===@@@@1@3@@danf@17-8-2009 10253060@unknown@formal@none@1@S@Esperanto has served an important role in several religions, such as [[Oomoto]] from Japan and [[Baha'i]] from Iran, and has been encouraged by others.@@@@1@24@@danf@17-8-2009 10253070@unknown@formal@none@1@S@==== Oomoto ====@@@@1@3@@danf@17-8-2009 10253080@unknown@formal@none@1@S@The [[Oomoto]] religion encourages the use of Esperanto among their followers and includes Zamenhof as one of its deified spirits.@@@@1@20@@danf@17-8-2009 10253090@unknown@formal@none@1@S@==== Bahá'í Faith====@@@@1@3@@danf@17-8-2009 10253100@unknown@formal@none@1@S@The [[Bahá'í Faith]] encourages the [[Bahá'í Faith and auxiliary language|use of an auxiliary international language]].@@@@1@15@@danf@17-8-2009 10253110@unknown@formal@none@1@S@While endorsing no specific language, some Bahá'ís see Esperanto as having great potential in this role.@@@@1@16@@danf@17-8-2009 10253120@unknown@formal@none@1@S@[[Lidja Zamenhof]], the daughter of Esperanto founder [[L. L. Zamenhof]], became a Bahá'í.@@@@1@13@@danf@17-8-2009 10253130@unknown@formal@none@1@S@Various volumes of the [[Bahá'í literature]]s and other Baha'i books have been translated into Esperanto.@@@@1@15@@danf@17-8-2009 10253140@unknown@formal@none@1@S@==== Spiritism ====@@@@1@3@@danf@17-8-2009 10253150@unknown@formal@none@1@S@Esperanto is also actively promoted, at least in [[Brazil]], by followers of [[Spiritism]].@@@@1@13@@danf@17-8-2009 10253160@unknown@formal@none@1@S@The Brazilian Spiritist Federation publishes Esperanto coursebooks, translations of [[Spiritist Codification|Spiritism's basic books]], and encourages Spiritists to become Esperantists.@@@@1@19@@danf@17-8-2009 10253170@unknown@formal@none@1@S@==== Bible translations ====@@@@1@4@@danf@17-8-2009 10253180@unknown@formal@none@1@S@The first translation of the [[Bible]] into Esperanto was a translation of the [[Tanach]] or Old Testament done by [[L. L. Zamenhof]].@@@@1@22@@danf@17-8-2009 10253190@unknown@formal@none@1@S@The translation was reviewed and compared with other languages' translations by a group of British clergy and scholars before publishing it at the [[British and Foreign Bible Society]] in 1910.@@@@1@30@@danf@17-8-2009 10253200@unknown@formal@none@1@S@In 1926 this was published along with a New Testament translation, in an edition commonly called the "Londona Biblio".@@@@1@19@@danf@17-8-2009 10253210@unknown@formal@none@1@S@In the 1960s, the ''Internacia Asocio de Bibliistoj kaj Orientalistoj'' tried to organize a new, ecumenical Esperanto Bible version.@@@@1@19@@danf@17-8-2009 10253220@unknown@formal@none@1@S@Since then, the Dutch Lutheran pastor Gerrit Berveling has translated the [[Deuterocanonical]] or apocryphal books in addition to new translations of the Gospels, some of the New Testament epistles, and some books of the Tanakh or Old Testament.@@@@1@38@@danf@17-8-2009 10253230@unknown@formal@none@1@S@These have been published in various separate booklets, or serialized in ''Dia Regno'', but the [[Deuterocanonical]] books have appeared in recent editions of the Londona Biblio.@@@@1@26@@danf@17-8-2009 10253240@unknown@formal@none@1@S@==== Christianity ====@@@@1@3@@danf@17-8-2009 10253250@unknown@formal@none@1@S@Two Roman Catholic popes, [[Pope John Paul II|John Paul II]] and [[Pope Benedict XVI|Benedict XVI]], have regularly used Esperanto in their multilingual ''[[urbi et orbi]]'' blessings at Easter and Christmas each year since Easter 1994.@@@@1@35@@danf@17-8-2009 10253260@unknown@formal@none@1@S@Christian Esperanto organizations include two that were formed early in the history of Esperanto, the [[International Union of Catholic Esperantists]] and the [[List of Esperanto organizations#Religion|International Christian Esperantists League]].@@@@1@29@@danf@17-8-2009 10253270@unknown@formal@none@1@S@An issue of "The Friend" describes the activities of the [[Quaker]] Esperanto Society.@@@@1@13@@danf@17-8-2009 10253280@unknown@formal@none@1@S@There are instances of Christian apologists and teachers who use Esperanto as a medium.@@@@1@14@@danf@17-8-2009 10253290@unknown@formal@none@1@S@[[Nigeria]]n [[Pastor]] Bayo Afolaranmi's "[http://groups.yahoo.com/group/spiritanutrajxo/ Spirita nutraĵo]" (spiritual food) Yahoo mailing list, for example, has hosted weekly messages since 2003.@@@@1@20@@danf@17-8-2009 10253300@unknown@formal@none@1@S@[[Chick Publications]], publisher of [[Fundamentalist Christianity|Protestant fundamentalist]] themed evangelistic tracts, has published a number of comic book style tracts by [[Jack T. Chick]] translated into Esperanto, including "This Was Your Life!"@@@@1@31@@danf@17-8-2009 10253310@unknown@formal@none@1@S@("Jen Via Tuto Vivo!")@@@@1@4@@danf@17-8-2009 10253320@unknown@formal@none@1@S@==== Islam ====@@@@1@3@@danf@17-8-2009 10253330@unknown@formal@none@1@S@[[Ayatollah Khomeini]] of [[Iran]] called on Muslims to learn Esperanto and praised its use as a medium for better understanding among peoples of different religious backgrounds.@@@@1@26@@danf@17-8-2009 10253340@unknown@formal@none@1@S@After he suggested that Esperanto replace English as an international [[lingua franca]], it began to be used in the seminaries of [[Qom]].@@@@1@22@@danf@17-8-2009 10253350@unknown@formal@none@1@S@An Esperanto translation of the [[Qur'an]] was published by the state shortly thereafter.@@@@1@13@@danf@17-8-2009 10253360@unknown@formal@none@1@S@In 1981, Khomeini and the Iranian government began to oppose Esperanto after realising that followers of the [[Bahá'í Faith]] were interested in it.@@@@1@23@@danf@17-8-2009 10253370@unknown@formal@none@1@S@== Criticism ==@@@@1@3@@danf@17-8-2009 10253380@unknown@formal@none@1@S@Esperanto was conceived as a language of international communication, more precisely as a universal [[second language]].@@@@1@16@@danf@17-8-2009 10253390@unknown@formal@none@1@S@Since publication, there has been debate over whether it is possible for Esperanto to attain this position, and whether it would be an improvement for international communication if it did.@@@@1@30@@danf@17-8-2009 10253400@unknown@formal@none@1@S@There have been a number of attempts to reform the language, the most well-known of which is the language [[Ido]] which resulted in a schism in the community at the time, beginning in 1907.@@@@1@34@@danf@17-8-2009 10253410@unknown@formal@none@1@S@Since Esperanto is a planned language, there have been many, often passionate, criticisms of minor points which are too numerous to cover here, such as Zamenhof's choice of the word ''edzo'' over something like ''spozo'' for "husband, spouse", or his choice of the Classic Greek and Old Latin singular and plural endings ''-o, -oj, -a, -aj'' over their Medieval contractions ''-o, -i, -a, -e.''@@@@1@64@@danf@17-8-2009 10253420@unknown@formal@none@1@S@(Both these changes were adopted by the Ido reform, though Ido dispensed with adjectival agreement altogether.)@@@@1@16@@danf@17-8-2009 10253430@unknown@formal@none@1@S@See the links [[Esperanto#Criticism|below]] for examples of more general criticism.@@@@1@10@@danf@17-8-2009 10253440@unknown@formal@none@1@S@The more common points include:@@@@1@5@@danf@17-8-2009 10253450@unknown@formal@none@1@S@* Esperanto has failed the expectations of its founder to become a universal second language.@@@@1@15@@danf@17-8-2009 10253460@unknown@formal@none@1@S@Although many promoters of Esperanto stress the few successes it has had, the fact remains that well over a century since its publication, the portion of the world that speaks Esperanto, and the number of primary and secondary schools which teach it, remain minuscule.@@@@1@44@@danf@17-8-2009 10253470@unknown@formal@none@1@S@It simply cannot compete with English in this regard.@@@@1@9@@danf@17-8-2009 10253480@unknown@formal@none@1@S@* The vocabulary and grammar are based on major European languages, and are not universal.@@@@1@15@@danf@17-8-2009 10253490@unknown@formal@none@1@S@Often this criticism is specific to a few points such as adjectival agreement and the accusative case (generally such obvious details are all that reform projects suggest changing), but sometimes it is more general: Both the grammar and the 'international' vocabulary are difficult for many Asians, among others, and give an unfair advantage to speakers of European languages.@@@@1@58@@danf@17-8-2009 10253500@unknown@formal@none@1@S@One attempt to address this issue is [[Lojban]], which draws from the six populous languages [[Arabic language|Arabic]], [[Chinese language|Chinese]], [[English language|English]], [[Hindi]], [[Russian language|Russian]], and [[Spanish language|Spanish]], and whose grammar is designed for computer parsing.@@@@1@35@@danf@17-8-2009 10253510@unknown@formal@none@1@S@* The vocabulary, diacritic letters, and grammar are too dissimilar from the major Western European languages, and therefore Esperanto is not as easy as it could be for speakers of those languages to learn.@@@@1@34@@danf@17-8-2009 10253520@unknown@formal@none@1@S@Attempts to address this issue include the younger planned languages [[Ido]] and [[Interlingua]].@@@@1@13@@danf@17-8-2009 10253530@unknown@formal@none@1@S@* Esperanto phonology is unimaginatively provincial, being essentially [[Belorussian language|Belorussian]] with regularized stress, leaving out only the [[nasal vowel]]s, [[palatalization|palatalized consonants]], and /dz/.@@@@1@23@@danf@17-8-2009 10253540@unknown@formal@none@1@S@For example, Esperanto has phonemes such as {{IPA|/x/, /ʒ/, /ts/, /eu̯/}} ''(ĥ, ĵ, c, eŭ)'' which are rare as distinct phonemes outside Europe.@@@@1@23@@danf@17-8-2009 10253550@unknown@formal@none@1@S@(Note that none of these are found in initial position in English.)@@@@1@12@@danf@17-8-2009 10253560@unknown@formal@none@1@S@* Esperanto has no culture.@@@@1@5@@danf@17-8-2009 10253570@unknown@formal@none@1@S@Although it has a large international literature, Esperanto does not encapsulate a specific culture.@@@@1@14@@danf@17-8-2009 10253580@unknown@formal@none@1@S@* Esperanto is culturally European.@@@@1@5@@danf@17-8-2009 10253590@unknown@formal@none@1@S@This is due to the European derivation of its vocabulary, and more insidiously, its [[semantics]]; both infuse the language with a European world view.@@@@1@24@@danf@17-8-2009 10253600@unknown@formal@none@1@S@* The vocabulary is too large.@@@@1@6@@danf@17-8-2009 10253610@unknown@formal@none@1@S@Rather than deriving new words from existing roots, large numbers of new roots are adopted into the language by people who think they're international, when in fact they're only European.@@@@1@30@@danf@17-8-2009 10253620@unknown@formal@none@1@S@This makes the language much more difficult for non-Europeans than it needs to be.@@@@1@14@@danf@17-8-2009 10253630@unknown@formal@none@1@S@* Esperanto is [[sexism|sexist]].@@@@1@4@@danf@17-8-2009 10253640@unknown@formal@none@1@S@As in English, there is no neutral pronoun for ''s/he,'' and most kin terms and titles are masculine by default and only feminine when so specified.@@@@1@26@@danf@17-8-2009 10253650@unknown@formal@none@1@S@There have been many attempts to address this issue, of which one of the better known is [[Riism]].@@@@1@18@@danf@17-8-2009 10253660@unknown@formal@none@1@S@* Esperanto is, looks, or sounds artificial.@@@@1@7@@danf@17-8-2009 10253670@unknown@formal@none@1@S@This criticism is primarily due to the letters with circumflex diacritics, which some find odd or cumbersome, and to the lack of fluent speakers: Few Esperantists have spent much time with fluent, let alone native, speakers, and many learn Esperanto relatively late in life, and so speak haltingly, which can create a negative impression among non-speakers.@@@@1@56@@danf@17-8-2009 10253680@unknown@formal@none@1@S@Among fluent speakers, Esperanto sounds no more artificial than any other language.@@@@1@12@@danf@17-8-2009 10253690@unknown@formal@none@1@S@Others claim that an artificial language will necessarily be deficient, due to its very nature, but the [[Hungarian Academy of Sciences]] has found that Esperanto fulfills all the requirements of a living language.@@@@1@33@@danf@17-8-2009 10253700@unknown@formal@none@1@S@== Modifications ==@@@@1@3@@danf@17-8-2009 10253710@unknown@formal@none@1@S@Though Esperanto itself has changed little since the publication of the ''[[Fundamento de Esperanto]]'' (Foundation of Esperanto), a number of reform projects have been proposed over the years, starting with [[Reformed Esperanto|Zamenhof's proposals in 1894]] and [[Ido]] in 1907.@@@@1@39@@danf@17-8-2009 10253720@unknown@formal@none@1@S@Several later constructed languages, such as Fasile, were based on Esperanto.@@@@1@11@@danf@17-8-2009 10253730@unknown@formal@none@1@S@In modern times, attempts have been made to eliminate perceived sexism in the language.@@@@1@14@@danf@17-8-2009 10253740@unknown@formal@none@1@S@One example of this is [[Riism]].@@@@1@6@@danf@17-8-2009 10253750@unknown@formal@none@1@S@However, as Esperanto has become a living language, changes are as difficult to implement as in ethnic languages.@@@@1@18@@danf@17-8-2009 10260010@unknown@formal@none@1@S@
Formal grammar
@@@@1@2@@danf@17-8-2009 10260020@unknown@formal@none@1@S@In [[formal semantics]], [[computer science]] and [[linguistics]], a '''formal grammar''' (also called '''formation rules''') is a precise description of a [[formal language]] – that is, of a [[set]] of [[String (computer science)|strings]] over some [[Alphabet (computer science)|alphabet]].@@@@1@37@@danf@17-8-2009 10260030@unknown@formal@none@1@S@In other words, a grammar describes which of the possible sequences of symbols (strings) in a language constitute valid words or statements in that language, but it does not describe their [[semantics]] (i.e. what they mean).@@@@1@36@@danf@17-8-2009 10260040@unknown@formal@none@1@S@The branch of mathematics that is concerned with the properties of formal grammars and languages is called [[formal language theory]].@@@@1@20@@danf@17-8-2009 10260050@unknown@formal@none@1@S@A grammar is usually regarded as a means to [[generate]] all the valid strings of a language; it can also be used as the basis for a [[recognizer]] that determines for any given string whether it is [[grammatical]] (i.e. belongs to the language).@@@@1@43@@danf@17-8-2009 10260060@unknown@formal@none@1@S@To describe such recognizers, formal language theory uses separate formalisms, known as [[automata theory|automata]].@@@@1@14@@danf@17-8-2009 10260070@unknown@formal@none@1@S@A grammar can also be used to [[analyze]] the strings of a language – i.e. to describe their internal structure.@@@@1@20@@danf@17-8-2009 10260080@unknown@formal@none@1@S@In computer science, this process is known as [[parsing]].@@@@1@9@@danf@17-8-2009 10260090@unknown@formal@none@1@S@Most languages have very [[compositional semantics]], i.e. the meaning of their utterances is structured according to their [[syntax]]; therefore, the first step to describing the meaning of an utterance in language is to analyze it and look at its analyzed form (known as its [[parse tree]] in computer science, and as its [[deep structure]] in [[generative grammar]]).@@@@1@57@@danf@17-8-2009 10260100@unknown@formal@none@1@S@== Background ==@@@@1@3@@danf@17-8-2009 10260110@unknown@formal@none@1@S@=== Formal language ===@@@@1@4@@danf@17-8-2009 10260120@unknown@formal@none@1@S@A ''formal language'' is an organized [[set]] of [[symbol]]s the essential feature of which is that it can be precisely defined in terms of just the shapes and locations of those symbols.@@@@1@32@@danf@17-8-2009 10260130@unknown@formal@none@1@S@Such a language can be defined, then, without any [[reference]] to any [[meaning (linguistics)|meaning]]s of any of its expressions; it can exist before any [[formal interpretation]] is assigned to it -- that is, before it has any meaning.@@@@1@38@@danf@17-8-2009 10260140@unknown@formal@none@1@S@First order logic is expressed in some formal language.@@@@1@9@@danf@17-8-2009 10260150@unknown@formal@none@1@S@A formal grammar determines which symbols and sets of symbols are [[Formula (mathematical logic)|formula]]s in a formal language.@@@@1@18@@danf@17-8-2009 10260160@unknown@formal@none@1@S@=== Formal systems ===@@@@1@4@@danf@17-8-2009 10260170@unknown@formal@none@1@S@A ''formal system'' (also called a ''logical calculus'', or a ''logical system'') consists of a formal language together with a [[deductive apparatus]] (also called a ''deductive system'').@@@@1@27@@danf@17-8-2009 10260180@unknown@formal@none@1@S@The deductive apparatus may consist of a set of [[transformation rule]]s (also called ''inference rules'') or a set of [[axiom]]s, or have both.@@@@1@23@@danf@17-8-2009 10260190@unknown@formal@none@1@S@A formal system is used to [[Proof theory|derive]] one expression from one or more other expressions.@@@@1@16@@danf@17-8-2009 10260200@unknown@formal@none@1@S@=== Formal proofs ===@@@@1@4@@danf@17-8-2009 10260210@unknown@formal@none@1@S@A ''formal proof'' is a sequence of well-formed formulas of a formal language, the last one of which is a [[theorem]] of a formal system.@@@@1@25@@danf@17-8-2009 10260220@unknown@formal@none@1@S@The theorem is a [[syntactic consequence]] of all the wffs preceding it in the proof.@@@@1@15@@danf@17-8-2009 10260230@unknown@formal@none@1@S@For a wff to qualify as part of a proof, it must be the result of applying a rule of the deductive apparatus of some formal system to the previous wffs in the proof sequence.@@@@1@35@@danf@17-8-2009 10260240@unknown@formal@none@1@S@=== Formal interpretations ===@@@@1@4@@danf@17-8-2009 10260250@unknown@formal@none@1@S@An ''interpretation'' of a formal system is the assignment of meanings to the symbols, and truth-values to the sentences of a formal system.@@@@1@23@@danf@17-8-2009 10260260@unknown@formal@none@1@S@The study of formal interpretations is called [[formal semantics]].@@@@1@9@@danf@17-8-2009 10260270@unknown@formal@none@1@S@''Giving an interpretation'' is synonymous with ''constructing a [[Structure (mathematical logic)|model]].@@@@1@11@@danf@17-8-2009 10260280@unknown@formal@none@1@S@== Formal grammars ==@@@@1@4@@danf@17-8-2009 10260290@unknown@formal@none@1@S@A grammar mainly consists of a set of rules for transforming strings.@@@@1@12@@danf@17-8-2009 10260300@unknown@formal@none@1@S@(If it ''only'' consisted of these rules, it would be a [[semi-Thue system]].)@@@@1@13@@danf@17-8-2009 10260310@unknown@formal@none@1@S@To generate a string in the language, one begins with a string consisting of only a single ''start symbol'', and then successively applies the rules (any number of times, in any order) to rewrite this string.@@@@1@36@@danf@17-8-2009 10260320@unknown@formal@none@1@S@The language consists of all the strings that can be generated in this manner.@@@@1@14@@danf@17-8-2009 10260330@unknown@formal@none@1@S@Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language.@@@@1@18@@danf@17-8-2009 10260340@unknown@formal@none@1@S@If there are multiple ways of generating the same single string, then the grammar is said to be [[ambiguous grammar|ambiguous]].@@@@1@20@@danf@17-8-2009 10260350@unknown@formal@none@1@S@For example, assume the alphabet consists of a and b, the start symbol is S and we have the following rules:@@@@1@21@@danf@17-8-2009 10260360@unknown@formal@none@1@S@: 1. S \\rightarrow aSb@@@@1@5@@danf@17-8-2009 10260370@unknown@formal@none@1@S@: 2. S \\rightarrow ba@@@@1@5@@danf@17-8-2009 10260380@unknown@formal@none@1@S@then we start with S, and can choose a rule to apply to it.@@@@1@14@@danf@17-8-2009 10260390@unknown@formal@none@1@S@If we choose rule 1, we obtain the string aSb.@@@@1@10@@danf@17-8-2009 10260400@unknown@formal@none@1@S@If we choose rule 1 again, we replace S with aSb and obtain the string aaSbb.@@@@1@16@@danf@17-8-2009 10260410@unknown@formal@none@1@S@This process can be repeated at will until all occurrences of ''S'' are removed, and only symbols from the alphabet remain (i.e., a and b).@@@@1@25@@danf@17-8-2009 10260420@unknown@formal@none@1@S@For example, if we now choose rule 2, we replace S with ba and obtain the string aababb, and are done.@@@@1@21@@danf@17-8-2009 10260430@unknown@formal@none@1@S@We can write this series of choices more briefly, using symbols: S \\Rightarrow aSb \\Rightarrow aaSbb \\Rightarrow aababb.@@@@1@18@@danf@17-8-2009 10260440@unknown@formal@none@1@S@The language of the grammar is the set of all the strings that can be generated using this process: \\left \\{ba, abab, aababb, aaababbb, ...\\right \\}.@@@@1@26@@danf@17-8-2009 10260450@unknown@formal@none@1@S@=== Formal definition ===@@@@1@4@@danf@17-8-2009 10260460@unknown@formal@none@1@S@In the classic formalization of generative grammars first proposed by [[Noam Chomsky]] in the 1950s, a grammar ''G'' consists of the following components:@@@@1@23@@danf@17-8-2009 10260470@unknown@formal@none@1@S@* A finite set N of ''[[nonterminal symbol]]s''.@@@@1@8@@danf@17-8-2009 10260480@unknown@formal@none@1@S@* A finite set \\Sigma of ''[[terminal symbol]]s'' that is [[Disjoint sets|disjoint]] from N.@@@@1@14@@danf@17-8-2009 10260490@unknown@formal@none@1@S@* A finite set P of ''production rules'', each of the form@@@@1@12@@danf@17-8-2009 10260500@unknown@formal@none@1@S@:: (\\Sigma \\cup N)^{*} N (\\Sigma \\cup N)^{*} \\rightarrow (\\Sigma \\cup N)^{*} @@@@1@13@@danf@17-8-2009 10260510@unknown@formal@none@1@S@:where {}^{*} is the [[Kleene star]] operator and \\cup denotes [[union (set theory)|set union]].@@@@1@14@@danf@17-8-2009 10260520@unknown@formal@none@1@S@That is, each production rule maps from one string of symbols to another, where the first string contains at least one nonterminal symbol.@@@@1@23@@danf@17-8-2009 10260530@unknown@formal@none@1@S@In the case that the second string is the [[empty string]] – that is, that it contains no symbols at all – in order to avoid confusion, the empty string is often denoted with a special notation, often (\\lambda, e or \\epsilon.@@@@1@42@@danf@17-8-2009 10260540@unknown@formal@none@1@S@* A distinguished symbol S \\in N that is the ''start symbol''.@@@@1@12@@danf@17-8-2009 10260550@unknown@formal@none@1@S@A grammar is formally defined as the ordered quad-tuple (N, \\Sigma, P, S).@@@@1@13@@danf@17-8-2009 10260560@unknown@formal@none@1@S@Such a formal grammar is often called a ''rewriting system'' or a ''phrase structure grammar'' in the literature.@@@@1@18@@danf@17-8-2009 10260570@unknown@formal@none@1@S@The operation of a grammar can be defined in terms of relations on strings:@@@@1@14@@danf@17-8-2009 10260580@unknown@formal@none@1@S@* Given a grammar G = (N, \\Sigma, P, S), the binary relation \\Rightarrow_G (pronounced as "G derives in one step") on strings in (\\Sigma \\cup N)^{*} is defined by:@@@@1@30@@danf@17-8-2009 10260590@unknown@formal@none@1@S@x \\Rightarrow_G y \\mbox{ iff } \\exists u, v, w \\in \\Sigma^*, X \\in N: x = uXv \\wedge y = uwv \\wedge X \\rightarrow w \\in P@@@@1@28@@danf@17-8-2009 10260600@unknown@formal@none@1@S@* the relation {\\Rightarrow_G}^* (pronounced as ''G derives in zero or more steps'') is defined as the [[transitive closure]] of (\\Sigma \\cup N)^{*}@@@@1@23@@danf@17-8-2009 10260610@unknown@formal@none@1@S@* the ''language'' of G, denoted as \\boldsymbol{L}(G), is defined as all those strings over \\Sigma that can be generated by starting with the start symbol S and then applying the production rules in P until no more nonterminal symbols are present; that is, the set \\{ w \\in \\Sigma^* \\mid S {\\Rightarrow_G}^* w \\}.@@@@1@55@@danf@17-8-2009 10260620@unknown@formal@none@1@S@Note that the grammar G = (N, \\Sigma, P, S) is effectively the [[semi-Thue system]] (N \\cup \\Sigma, P), rewriting strings in exactly the same way; the only difference is in that we distinguish specific ''nonterminal'' symbols which must be rewritten in rewrite rules, and are only interested in rewritings from the designated start symbol S to strings without nonterminal symbols.@@@@1@61@@danf@17-8-2009 10260630@unknown@formal@none@1@S@=== Example ===@@@@1@3@@danf@17-8-2009 10260640@unknown@formal@none@1@S@''For these examples, formal languages are specified using [[set-builder notation]].''@@@@1@10@@danf@17-8-2009 10260650@unknown@formal@none@1@S@Consider the grammar G where N = \\left \\{S, B\\right \\}, \\Sigma = \\left \\{a, b, c\\right \\}, S is the start symbol, and P consists of the following production rules:@@@@1@31@@danf@17-8-2009 10260660@unknown@formal@none@1@S@: 1. S \\rightarrow aBSc@@@@1@5@@danf@17-8-2009 10260670@unknown@formal@none@1@S@: 2. S \\rightarrow abc@@@@1@5@@danf@17-8-2009 10260680@unknown@formal@none@1@S@: 3. Ba \\rightarrow aB@@@@1@5@@danf@17-8-2009 10260690@unknown@formal@none@1@S@: 4. Bb \\rightarrow bb @@@@1@6@@danf@17-8-2009 10260700@unknown@formal@none@1@S@Some examples of the derivation of strings in \\boldsymbol{L}(G) are:@@@@1@10@@danf@17-8-2009 10260710@unknown@formal@none@1@S@* \\boldsymbol{S} \\Rightarrow_2 \\boldsymbol{abc}@@@@1@4@@danf@17-8-2009 10260720@unknown@formal@none@1@S@* \\boldsymbol{S} \\Rightarrow_1 \\boldsymbol{aBSc} \\Rightarrow_2 aB\\boldsymbol{abc}c \\Rightarrow_3 a\\boldsymbol{aB}bcc \\Rightarrow_4 aa\\boldsymbol{bb}cc@@@@1@10@@danf@17-8-2009 10260730@unknown@formal@none@1@S@* \\boldsymbol{S} \\Rightarrow_1 \\boldsymbol{aBSc} \\Rightarrow_1 aB\\boldsymbol{aBSc}c \\Rightarrow_2 aBaB\\boldsymbol{abc}cc \\Rightarrow_3 a\\boldsymbol{aB}Babccc \\Rightarrow_3 aaB\\boldsymbol{aB}bccc \\Rightarrow_3 aa\\boldsymbol{aB}Bbccc \\Rightarrow_4 aaaB\\boldsymbol{bb}ccc \\Rightarrow_4 aaa\\boldsymbol{bb}bccc@@@@1@19@@danf@17-8-2009 10260740@unknown@formal@none@1@S@:(Note on notation: L \\Rightarrow_i R reads "''L'' generates ''R'' by means of production ''i''" and the generated part is each time indicated in bold.)@@@@1@25@@danf@17-8-2009 10260750@unknown@formal@none@1@S@This grammar defines the language L = \\left \\{ a^{n}b^{n}c^{n} | n \\ge 1 \\right \\} where a^{n} denotes a string of ''n'' consecutive a's.@@@@1@25@@danf@17-8-2009 10260760@unknown@formal@none@1@S@Thus, the language is the set of strings that consist of 1 or more a's, followed by the same number of b's, followed by the same number of c's.@@@@1@29@@danf@17-8-2009 10260770@unknown@formal@none@1@S@=== The Chomsky hierarchy ===@@@@1@5@@danf@17-8-2009 10260780@unknown@formal@none@1@S@When [[Noam Chomsky]] first formalized generative grammars in 1956, he classified them into types now known as the [[Chomsky hierarchy]].@@@@1@20@@danf@17-8-2009 10260790@unknown@formal@none@1@S@The difference between these types is that they have increasingly strict production rules and can express fewer formal languages.@@@@1@19@@danf@17-8-2009 10260800@unknown@formal@none@1@S@Two important types are ''[[context-free grammar]]s'' (Type 2) and ''[[regular grammar]]s'' (Type 3).@@@@1@13@@danf@17-8-2009 10260810@unknown@formal@none@1@S@The languages that can be described with such a grammar are called ''[[context-free language]]s'' and ''[[regular language]]s'', respectively.@@@@1@18@@danf@17-8-2009 10260820@unknown@formal@none@1@S@Although much less powerful than unrestricted grammars (Type 0), which can in fact express any language that can be accepted by a [[Turing machine]], these two restricted types of grammars are most often used because [[parsing|parser]]s for them can be efficiently implemented.@@@@1@42@@danf@17-8-2009 10260830@unknown@formal@none@1@S@For example, all regular languages can be recognized by a [[finite state machine]], and for useful subsets of context-free grammars there are well-known algorithms to generate efficient [[LL parser]]s and [[LR parser]]s to recognize the corresponding languages those grammars generate.@@@@1@40@@danf@17-8-2009 10260840@unknown@formal@none@1@S@==== Context-free grammars ====@@@@1@4@@danf@17-8-2009 10260850@unknown@formal@none@1@S@A ''[[context-free grammar]]'' is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol.@@@@1@22@@danf@17-8-2009 10260860@unknown@formal@none@1@S@This restriction is non-trivial; not all languages can be generated by context-free grammars.@@@@1@13@@danf@17-8-2009 10260870@unknown@formal@none@1@S@Those that can are called ''context-free languages''.@@@@1@7@@danf@17-8-2009 10260880@unknown@formal@none@1@S@The language defined above is not a context-free language, and this can be strictly proven using the [[pumping lemma for context-free languages]], but for example the language \\left \\{ a^{n}b^{n} | n \\ge 1 \\right \\} (at least 1 a followed by the same number of b's) is context-free, as it can be defined by the grammar G_2 with N=\\left \\{S\\right \\}, \\Sigma=\\left \\{a,b\\right \\}, S the start symbol, and the following production rules:@@@@1@74@@danf@17-8-2009 10260890@unknown@formal@none@1@S@: 1. S \\rightarrow aSb@@@@1@5@@danf@17-8-2009 10260900@unknown@formal@none@1@S@: 2. S \\rightarrow ab@@@@1@5@@danf@17-8-2009 10260910@unknown@formal@none@1@S@A context-free language can be recognized in O(n^3) time (''see'' [[Big O notation]]) by an algorithm such as [[Earley's algorithm]].@@@@1@20@@danf@17-8-2009 10260920@unknown@formal@none@1@S@That is, for every context-free language, a machine can be built that takes a string as input and determines in O(n^3) time whether the string is a member of the language, where n is the length of the string.@@@@1@39@@danf@17-8-2009 10260930@unknown@formal@none@1@S@Further, some important subsets of the context-free languages can be recognized in linear time using other algorithms.@@@@1@17@@danf@17-8-2009 10260940@unknown@formal@none@1@S@==== Regular grammars ====@@@@1@4@@danf@17-8-2009 10260950@unknown@formal@none@1@S@In [[regular grammar]]s, the left hand side is again only a single nonterminal symbol, but now the right-hand side is also restricted: It may be the empty string, or a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol, but nothing else.@@@@1@46@@danf@17-8-2009 10260960@unknown@formal@none@1@S@(Sometimes a broader definition is used: one can allow longer strings of terminals or single nonterminals without anything else, making languages [[syntactic sugar|easier to denote]] while still defining the same class of languages.)@@@@1@33@@danf@17-8-2009 10260970@unknown@formal@none@1@S@The language defined above is not regular, but the language \\left \\{ a^{n}b^{m} \\,| \\, m,n \\ge 1 \\right \\} (at least 1 a followed by at least 1 b, where the numbers may be different) is, as it can be defined by the grammar G_3 with N=\\left \\{S, A,B\\right \\}, \\Sigma=\\left \\{a,b\\right \\}, S the start symbol, and the following production rules:@@@@1@63@@danf@17-8-2009 10260980@unknown@formal@none@1@S@:# S \\rightarrow aA@@@@1@4@@danf@17-8-2009 10260990@unknown@formal@none@1@S@:# A \\rightarrow aA@@@@1@4@@danf@17-8-2009 10261000@unknown@formal@none@1@S@:# A \\rightarrow bB@@@@1@4@@danf@17-8-2009 10261010@unknown@formal@none@1@S@:# B \\rightarrow bB@@@@1@4@@danf@17-8-2009 10261020@unknown@formal@none@1@S@:# B \\rightarrow \\epsilon@@@@1@4@@danf@17-8-2009 10261030@unknown@formal@none@1@S@All languages generated by a regular grammar can be recognized in linear time by a [[finite state machine]].@@@@1@18@@danf@17-8-2009 10261040@unknown@formal@none@1@S@Although, in practice, regular grammars are commonly expressed using [[regular expression]]s, some forms of regular expression used in practice do not strictly generate the regular languages and do not show linear recognitional performance due to those deviations.@@@@1@37@@danf@17-8-2009 10261050@unknown@formal@none@1@S@=== Other forms of generative grammars ===@@@@1@7@@danf@17-8-2009 10261060@unknown@formal@none@1@S@Many extensions and variations on Chomsky's original hierarchy of formal grammars have been developed more recently, both by linguists and by computer scientists, usually either in order to increase their expressive power or in order to make them easier to analyze or [[parsing|parse]].@@@@1@43@@danf@17-8-2009 10261070@unknown@formal@none@1@S@Some forms of grammars developed include:@@@@1@6@@danf@17-8-2009 10261080@unknown@formal@none@1@S@* [[Tree-adjoining grammar]]s increase the expressiveness of conventional generative grammars by allowing rewrite rules to operate on [[parse tree]]s instead of just strings.@@@@1@23@@danf@17-8-2009 10261090@unknown@formal@none@1@S@* [[Affix grammar]]s and [[attribute grammar]]s allow rewrite rules to be augmented with semantic attributes and operations, useful both for increasing grammar expressiveness and for constructing practical language translation tools.@@@@1@30@@danf@17-8-2009 10261100@unknown@formal@none@1@S@== Analytic grammars ==@@@@1@4@@danf@17-8-2009 10261110@unknown@formal@none@1@S@Though there is very little literature on [[parsing]] [[algorithms]], most of these algorithms assume that the language to be parsed is initially ''described'' by means of a ''generative'' formal grammar, and that the goal is to transform this generative grammar into a working parser.@@@@1@44@@danf@17-8-2009 10261120@unknown@formal@none@1@S@Strictly speaking, a generative grammar does not in any way correspond to the algorithm used to parse a language, and various algorithms have different restrictions on the form of production rules that are considered well-formed.@@@@1@35@@danf@17-8-2009 10261130@unknown@formal@none@1@S@An alternative approach is to formalize the language in terms of an analytic grammar in the first place, which more directly corresponds to the structure and semantics of a parser for the language.@@@@1@33@@danf@17-8-2009 10261140@unknown@formal@none@1@S@Examples of analytic grammar formalisms include the following:@@@@1@8@@danf@17-8-2009 10261150@unknown@formal@none@1@S@* [[The Language Machine]] directly implements unrestricted analytic grammars.@@@@1@9@@danf@17-8-2009 10261160@unknown@formal@none@1@S@Substitution rules are used to transform an input to produce outputs and behaviour.@@@@1@13@@danf@17-8-2009 10261170@unknown@formal@none@1@S@The system can also produce [http://languagemachine.sourceforge.net/picturebook.html the lm-diagram] which shows what happens when the rules of an unrestricted analytic grammar are being applied.@@@@1@23@@danf@17-8-2009 10261180@unknown@formal@none@1@S@* [[Top-down parsing language]] (TDPL): a highly minimalist analytic grammar formalism developed in the early 1970s to study the behavior of [[Top-down parsing|top-down parsers]].@@@@1@24@@danf@17-8-2009 10261190@unknown@formal@none@1@S@* [[Link grammar]]s: a form of analytic grammar designed for [[linguistics]], which derives syntactic structure by examining the positional relationships between pairs of words.@@@@1@24@@danf@17-8-2009 10261200@unknown@formal@none@1@S@* [[Parsing expression grammar]]s (PEGs): a more recent generalization of TDPL designed around the practical [[expressiveness]] needs of [[programming language]] and [[compiler]] writers.@@@@1@23@@danf@17-8-2009 10270010@unknown@formal@none@1@S@
Free software
@@@@1@2@@danf@17-8-2009 10270020@unknown@formal@none@1@S@'''Free software''' or software libre is [[software]] that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with minimal restrictions only to ensure that further recipients can also do these things.@@@@1@46@@danf@17-8-2009 10270030@unknown@formal@none@1@S@In practice, for software to be distributed as free software, the human readable form of the program (the "[[source code]]") must be made available to the recipient along with a notice granting the above permissions.@@@@1@35@@danf@17-8-2009 10270040@unknown@formal@none@1@S@Such a notice is a "[[free software licence]]", or, in theory, could be a notice saying that the source code is released into the [[public domain]].@@@@1@26@@danf@17-8-2009 10270050@unknown@formal@none@1@S@The [[free software movement]] was conceived in 1983 by [[Richard Stallman]] to make these freedoms available to every computer user.@@@@1@20@@danf@17-8-2009 10270060@unknown@formal@none@1@S@From the late 1990s onward, [[alternative terms for free software]] came into use.@@@@1@13@@danf@17-8-2009 10270070@unknown@formal@none@1@S@"'''[[Open source software]]'''" is the most common such alternative term.@@@@1@10@@danf@17-8-2009 10270080@unknown@formal@none@1@S@Others include "'''software [[Gratis versus Libre|libre]]'''", "free, libre and open-source software" ("'''[[FOSS]]'''", or, with "libre", "'''FLOSS'''").@@@@1@16@@danf@17-8-2009 10270090@unknown@formal@none@1@S@The antonym of free software is "''[[proprietary software]]''" or ''non-free software''.@@@@1@11@@danf@17-8-2009 10270100@unknown@formal@none@1@S@Free software is distinct from "[[freeware]]" which is [[proprietary software]] made available free of charge.@@@@1@15@@danf@17-8-2009 10270110@unknown@formal@none@1@S@Users usually cannot study, modify, or redistribute freeware.@@@@1@8@@danf@17-8-2009 10270120@unknown@formal@none@1@S@Since free software may be freely redistributed, it generally is available at little or no cost.@@@@1@16@@danf@17-8-2009 10270130@unknown@formal@none@1@S@Free software business models are usually based on adding value such as support, training, customization, integration, or certification.@@@@1@18@@danf@17-8-2009 10270140@unknown@formal@none@1@S@At the same time, some business models which work with [[proprietary software]] are not compatible with free software, such as those that depend on a user paying for a licence in order to lawfully use a software product.@@@@1@38@@danf@17-8-2009 10270150@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10270160@unknown@formal@none@1@S@In the 1950s, 1960s, and 1970s, it was normal for computer users to have the freedoms that are provided by free software.@@@@1@22@@danf@17-8-2009 10270170@unknown@formal@none@1@S@[[Software]] was commonly shared by individuals who used computers and by hardware manufacturers who were glad that people were making software that made their hardware useful.@@@@1@26@@danf@17-8-2009 10270180@unknown@formal@none@1@S@In the 1970s and early 1980s, the [[software industry]] began using technical measures (such as only distributing [[Executable|binary copies]] of [[computer programs]]) to prevent [[computer users]] from being able to study and modify software..@@@@1@34@@danf@17-8-2009 10270190@unknown@formal@none@1@S@In 1980 [[copyright]] law was extended to computer programs.@@@@1@9@@danf@17-8-2009 10270200@unknown@formal@none@1@S@In 1983, [[Richard Stallman]], longtime member of the [[hacker (free and open source software)|hacker]] community at the [[MIT Artificial Intelligence Laboratory]], announced the [[GNU project]], saying that he had become frustrated with the effects of the change in culture of the computer industry and its users.@@@@1@46@@danf@17-8-2009 10270210@unknown@formal@none@1@S@Software development for the [[GNU operating system]] began in January 1984, and the [[Free Software Foundation]] (FSF) was founded in October 1985.@@@@1@22@@danf@17-8-2009 10270220@unknown@formal@none@1@S@He developed a free software definition and the concept of "[[copyleft]]", designed to ensure software freedom for all.@@@@1@18@@danf@17-8-2009 10270230@unknown@formal@none@1@S@Free software is a widespread international concept, producing software used by individuals, large organizations, and governmental administrations.@@@@1@17@@danf@17-8-2009 10270240@unknown@formal@none@1@S@Free software has a very high market penetration in server-side Internet applications such as the [[Apache web server]], [[MySQL]] database, and [[PHP]] scripting language.@@@@1@24@@danf@17-8-2009 10270250@unknown@formal@none@1@S@Completely free computing environments are available as large packages of basic system software, such as the many [[GNU/Linux distribution]]s and [[FreeBSD]].@@@@1@21@@danf@17-8-2009 10270260@unknown@formal@none@1@S@Free software [[Software development|developers]] have also created free versions of almost all commonly used desktop applications, including Web browsers, office productivity suites, and multimedia players.@@@@1@25@@danf@17-8-2009 10270270@unknown@formal@none@1@S@It is important to note, however, that in many categories, free software for individual [[workstation]]s or home users has only a fraction of the market share of its proprietary competitors.@@@@1@30@@danf@17-8-2009 10270280@unknown@formal@none@1@S@Most free software is distributed [[online]] without charge, or [[off-line]] at the [[marginal cost]] of distribution, but this pricing model is not required, and people may sell copies of free software programs for any price.@@@@1@35@@danf@17-8-2009 10270290@unknown@formal@none@1@S@The economic viability of free software has been recognised by large corporations such as [[IBM]], [[Red Hat]], and [[Sun Microsystems]].@@@@1@20@@danf@17-8-2009 10270300@unknown@formal@none@1@S@Many companies whose core business is not in the IT sector choose free software for their Internet information and sales sites, due to the lower initial capital investment and ability to freely customize the application packages.@@@@1@36@@danf@17-8-2009 10270310@unknown@formal@none@1@S@Also, some non-software industries are beginning to use techniques similar to those used in free software development for their research and development process; scientists, for example, are looking towards more open development processes, and hardware such as microchips are beginning to be developed with specifications released under [[copyleft]] licenses (see the [[OpenCores]] project, for instance).@@@@1@55@@danf@17-8-2009 10270320@unknown@formal@none@1@S@[[Creative Commons]] and the [[free culture movement]] have also been largely influenced by the free software movement.@@@@1@17@@danf@17-8-2009 10270330@unknown@formal@none@1@S@===Naming===@@@@1@1@@danf@17-8-2009 10270340@unknown@formal@none@1@S@The FSF recommends using the term "free software" rather than "open source software" because that term and the associated marketing campaign focuses on the technical issues of software development, avoiding the issue of user freedoms.@@@@1@35@@danf@17-8-2009 10270350@unknown@formal@none@1@S@"[[Libre]]" is used to avoid the ambiguity of the word "free".@@@@1@11@@danf@17-8-2009 10270360@unknown@formal@none@1@S@However, amongst English speakers, ''libre'' is primarily only used within the free software movement.@@@@1@14@@danf@17-8-2009 10270370@unknown@formal@none@1@S@== Definition ==@@@@1@3@@danf@17-8-2009 10270380@unknown@formal@none@1@S@The first formal definition of free software was published by FSF in February 1986.@@@@1@14@@danf@17-8-2009 10270390@unknown@formal@none@1@S@That definition, written by Richard Stallman, is still maintained today and states that software is free software if people who receive a copy of the software have the following four freedoms:@@@@1@31@@danf@17-8-2009 10270400@unknown@formal@none@1@S@* Freedom 0: The freedom to run the program for any purpose.@@@@1@12@@danf@17-8-2009 10270410@unknown@formal@none@1@S@* Freedom 1: The freedom to study and modify the program.@@@@1@11@@danf@17-8-2009 10270420@unknown@formal@none@1@S@* Freedom 2: The freedom to copy the program so you can help your neighbor.@@@@1@15@@danf@17-8-2009 10270430@unknown@formal@none@1@S@* Freedom 3: The freedom to improve the program, and release your improvements to the public, so that the whole community benefits.@@@@1@22@@danf@17-8-2009 10270440@unknown@formal@none@1@S@Freedoms 1 and 3 require [[source code]] to be available because studying and modifying software without its source code is highly impractical.@@@@1@22@@danf@17-8-2009 10270450@unknown@formal@none@1@S@Thus, free software means that [[user (computing)|computer users]] have the freedom to cooperate with whom they choose, and to control the software they use.@@@@1@24@@danf@17-8-2009 10270460@unknown@formal@none@1@S@To summarize this into a remark distinguishing ''[[Gratis versus Libre|libre]]'' (freedom) software from ''[[Gratis versus Libre|gratis]]'' (zero price) software, [[Richard Stallman]] said: "''Free software is a matter of liberty, not price.@@@@1@31@@danf@17-8-2009 10270470@unknown@formal@none@1@S@To understand the concept, you should think of 'free' as in '[[free speech]]', not as in '[[free beer]]'''".@@@@1@18@@danf@17-8-2009 10270480@unknown@formal@none@1@S@In the late 90s, other groups published their own definitions which describe an almost identical set of software.@@@@1@18@@danf@17-8-2009 10270490@unknown@formal@none@1@S@The most notable are [[Debian Free Software Guidelines]] published in 1997, and the [[Open Source Definition]], published in 1998.@@@@1@19@@danf@17-8-2009 10270500@unknown@formal@none@1@S@The BSD-based operating systems, such as [[FreeBSD]], [[OpenBSD]], and [[NetBSD]], do not have their own formal definitions of free software.@@@@1@20@@danf@17-8-2009 10270510@unknown@formal@none@1@S@Users of these systems generally find the same set of software to be acceptable, but sometimes see copyleft as restrictive.@@@@1@20@@danf@17-8-2009 10270520@unknown@formal@none@1@S@They generally advocate [[permissive free software licenses]], which allow others to make software based on their source code, and then release the modified result as proprietary software.@@@@1@27@@danf@17-8-2009 10270530@unknown@formal@none@1@S@Their view is that this permissive approach is more free.@@@@1@10@@danf@17-8-2009 10270540@unknown@formal@none@1@S@The [[Kerberos (protocol)|Kerberos]], [[X.org]], and [[Apache License|Apache]] software licenses are substantially similar in intent and implementation.@@@@1@16@@danf@17-8-2009 10270550@unknown@formal@none@1@S@All of these software packages originated in academic institutions interested in wide technology transfer ([[University of California]], [[Massachusetts Institute of Technology|MIT]], and [[University of Illinois at Urbana-Champaign|UIUC]]).@@@@1@27@@danf@17-8-2009 10270560@unknown@formal@none@1@S@== Examples of free software ==@@@@1@6@@danf@17-8-2009 10270570@unknown@formal@none@1@S@The [[Free Software Directory]] is a free software project that maintains a large database of free software packages.@@@@1@18@@danf@17-8-2009 10270580@unknown@formal@none@1@S@===Notable free software===@@@@1@3@@danf@17-8-2009 10270590@unknown@formal@none@1@S@* [[Graphical user interface|GUI]] related@@@@1@5@@danf@17-8-2009 10270600@unknown@formal@none@1@S@**[[X Window System]]@@@@1@3@@danf@17-8-2009 10270610@unknown@formal@none@1@S@**[[GNOME]]@@@@1@1@@danf@17-8-2009 10270620@unknown@formal@none@1@S@**[[KDE]]@@@@1@1@@danf@17-8-2009 10270630@unknown@formal@none@1@S@**[[Xfce]] desktop environments@@@@1@3@@danf@17-8-2009 10270640@unknown@formal@none@1@S@* [[OpenOffice.org]] office suite@@@@1@4@@danf@17-8-2009 10270650@unknown@formal@none@1@S@* [[Mozilla Application Suite|Mozilla]] and [[Mozilla Firefox|Firefox]] web browsers.@@@@1@9@@danf@17-8-2009 10270660@unknown@formal@none@1@S@* Typesetting and document preparation systems@@@@1@6@@danf@17-8-2009 10270670@unknown@formal@none@1@S@**[[TeX]]@@@@1@1@@danf@17-8-2009 10270680@unknown@formal@none@1@S@**[[LaTeX]]@@@@1@1@@danf@17-8-2009 10270690@unknown@formal@none@1@S@* Graphics tools like [[GIMP]] image graphics editor and [[Blender (software)|Blender]] 3D animation program.@@@@1@14@@danf@17-8-2009 10270700@unknown@formal@none@1@S@* [[Text editor]]s like [[vi]] or [[emacs]].@@@@1@7@@danf@17-8-2009 10270710@unknown@formal@none@1@S@* [[ogg]] is a free software multimedia container, used to hold [[ogg vorbis]] sound and [[ogg theora]] video.@@@@1@18@@danf@17-8-2009 10270720@unknown@formal@none@1@S@* [[Relational database]] systems@@@@1@4@@danf@17-8-2009 10270730@unknown@formal@none@1@S@**[[MySQL]]@@@@1@1@@danf@17-8-2009 10270740@unknown@formal@none@1@S@**[[PostgreSQL]]@@@@1@1@@danf@17-8-2009 10270750@unknown@formal@none@1@S@* [[GNU Compiler Collection|GCC]] compilers, [[GDB]] debugger and the [[GNU C Library]].@@@@1@12@@danf@17-8-2009 10270760@unknown@formal@none@1@S@====Programming languages====@@@@1@2@@danf@17-8-2009 10270770@unknown@formal@none@1@S@*[[Java (programming language)|Java]]@@@@1@3@@danf@17-8-2009 10270780@unknown@formal@none@1@S@*[[Perl]]@@@@1@1@@danf@17-8-2009 10270790@unknown@formal@none@1@S@*[[PHP]]@@@@1@1@@danf@17-8-2009 10270800@unknown@formal@none@1@S@*[[Python (programming language)|Python]]@@@@1@3@@danf@17-8-2009 10270810@unknown@formal@none@1@S@*[[Lua (programming language)|Lua]]@@@@1@3@@danf@17-8-2009 10270820@unknown@formal@none@1@S@*[[Ruby programming language|Ruby]]@@@@1@3@@danf@17-8-2009 10270830@unknown@formal@none@1@S@*[[Tcl]]@@@@1@1@@danf@17-8-2009 10270840@unknown@formal@none@1@S@====Servers====@@@@1@1@@danf@17-8-2009 10270850@unknown@formal@none@1@S@*[[Apache HTTP Server|Apache web server]]@@@@1@5@@danf@17-8-2009 10270860@unknown@formal@none@1@S@*[[BIND]] name server@@@@1@3@@danf@17-8-2009 10270870@unknown@formal@none@1@S@*[[Sendmail]] mail transport@@@@1@3@@danf@17-8-2009 10270880@unknown@formal@none@1@S@*[[Samba software|Samba]] file server.@@@@1@4@@danf@17-8-2009 10270890@unknown@formal@none@1@S@====Operating systems====@@@@1@2@@danf@17-8-2009 10270900@unknown@formal@none@1@S@*[[GNU/Linux]]@@@@1@1@@danf@17-8-2009 10270910@unknown@formal@none@1@S@*[[Berkeley Software Distribution|BSD]]@@@@1@3@@danf@17-8-2009 10270920@unknown@formal@none@1@S@*[[Darwin (operating system)|Darwin]]@@@@1@3@@danf@17-8-2009 10270930@unknown@formal@none@1@S@*[[OpenSolaris]]@@@@1@1@@danf@17-8-2009 10270940@unknown@formal@none@1@S@== Free software licenses ==@@@@1@5@@danf@17-8-2009 10270950@unknown@formal@none@1@S@All free software licenses must grant people all the freedoms discussed above.@@@@1@12@@danf@17-8-2009 10270960@unknown@formal@none@1@S@However, unless the applications' licenses are compatible, combining programs by mixing source code or directly linking binaries is problematic, because of license technicalities.@@@@1@23@@danf@17-8-2009 10270970@unknown@formal@none@1@S@Programs indirectly connected together may avoid this problem.@@@@1@8@@danf@17-8-2009 10270980@unknown@formal@none@1@S@The majority of free software uses a small set of licenses.@@@@1@11@@danf@17-8-2009 10270990@unknown@formal@none@1@S@The most popular of these licenses are:@@@@1@7@@danf@17-8-2009 10271000@unknown@formal@none@1@S@* the [[GNU General Public License]]@@@@1@6@@danf@17-8-2009 10271010@unknown@formal@none@1@S@* the [[GNU Lesser General Public License]]@@@@1@7@@danf@17-8-2009 10271020@unknown@formal@none@1@S@* the [[BSD License]]@@@@1@4@@danf@17-8-2009 10271030@unknown@formal@none@1@S@* the [[Mozilla Public License]]@@@@1@5@@danf@17-8-2009 10271040@unknown@formal@none@1@S@* the [[MIT License]]@@@@1@4@@danf@17-8-2009 10271050@unknown@formal@none@1@S@* the [[Apache License]]@@@@1@4@@danf@17-8-2009 10271060@unknown@formal@none@1@S@The Free Software Foundation and the Open Source Initiative both publish lists of licenses that they find to comply with their own definitions of free software and open-source software respectively.@@@@1@30@@danf@17-8-2009 10271070@unknown@formal@none@1@S@* [[List of FSF approved software licenses]]@@@@1@7@@danf@17-8-2009 10271080@unknown@formal@none@1@S@* [[List of OSI approved software licenses]]@@@@1@7@@danf@17-8-2009 10271090@unknown@formal@none@1@S@These lists are necessarily incomplete, because a license need not be known by either organization in order to provide these freedoms.@@@@1@21@@danf@17-8-2009 10271100@unknown@formal@none@1@S@Apart from these two organizations, the [[Debian]] project is seen by some to provide useful advice on whether particular licenses comply with their [[Debian Free Software Guidelines]].@@@@1@27@@danf@17-8-2009 10271110@unknown@formal@none@1@S@Debian doesn't publish a list of ''approved'' licenses, so its judgments have to be tracked by checking what software they have allowed into their software archives.@@@@1@26@@danf@17-8-2009 10271120@unknown@formal@none@1@S@That is summarized at the Debian web site.@@@@1@8@@danf@17-8-2009 10271130@unknown@formal@none@1@S@However, it is rare that a license is announced as being in-compliance by either FSF or OSI guidelines and not [[Vice_versa##vice_versa|vice versa]] (the [[Netscape Public License]] used for early versions of Mozilla being an exception), so exact definitions of the terms have not become hot issues.@@@@1@46@@danf@17-8-2009 10271140@unknown@formal@none@1@S@=== Permissive and copyleft licenses ===@@@@1@6@@danf@17-8-2009 10271150@unknown@formal@none@1@S@The FSF categorizes licenses in the following ways:@@@@1@8@@danf@17-8-2009 10271160@unknown@formal@none@1@S@* [[Public domain]] software - the copyright has expired, the work was not copyrighted or the author has abandoned the copyright.@@@@1@21@@danf@17-8-2009 10271170@unknown@formal@none@1@S@Since public-domain software lacks copyright protection, it may be freely incorporated into any work, whether proprietary or free.@@@@1@18@@danf@17-8-2009 10271180@unknown@formal@none@1@S@* [[permissive free software licences|Permissive licenses]], also called BSD-style because they are applied to much of the software distributed with the [[Berkeley Software Distribution|BSD]] operating systems.@@@@1@26@@danf@17-8-2009 10271190@unknown@formal@none@1@S@The author retains copyright solely to disclaim warranty and require proper attribution of modified works, but permits redistribution and modification in ''any'' work, even proprietary ones.@@@@1@26@@danf@17-8-2009 10271200@unknown@formal@none@1@S@* [[Copyleft]] licenses, the [[GNU General Public License]] being the most prominent.@@@@1@12@@danf@17-8-2009 10271210@unknown@formal@none@1@S@The author retains copyright and permits redistribution and modification provided all such redistribution is licensed under the same license.@@@@1@19@@danf@17-8-2009 10271220@unknown@formal@none@1@S@Additions and modifications by others must also be licensed under the same 'copyleft' license whenever they are distributed with part of the original licensed product.@@@@1@25@@danf@17-8-2009 10271230@unknown@formal@none@1@S@== Security and reliability==@@@@1@4@@danf@17-8-2009 10271240@unknown@formal@none@1@S@There is debate over the [[computer security|security]] of free software in comparison to proprietary software, with a major issue being [[security through obscurity]].@@@@1@23@@danf@17-8-2009 10271250@unknown@formal@none@1@S@A popular quantitative test in computer security is using relative counting of known unpatched security flaws.@@@@1@16@@danf@17-8-2009 10271260@unknown@formal@none@1@S@Generally, users of this method advise avoiding products which lack fixes for known security flaws, at least until a fix is available.@@@@1@22@@danf@17-8-2009 10271270@unknown@formal@none@1@S@Some claim that this method is biased by counting more vulnerabilities for the free software, since its source code is accessible and its community is more forthcoming about what problems exist.@@@@1@31@@danf@17-8-2009 10271280@unknown@formal@none@1@S@Free software advocates rebut that even if proprietary software does not have "published" flaws, flaws could still exist and possibly be known to malicious users.@@@@1@25@@danf@17-8-2009 10271290@unknown@formal@none@1@S@The ability of users to view and modify the source code allows many more people to potentially analyse the code and possibly to have a higher rate of finding bugs and flaws than an average sized corporation could manage.@@@@1@39@@danf@17-8-2009 10271300@unknown@formal@none@1@S@Users having access to the source code also makes creating and deploying [[spyware]] far more difficult.@@@@1@16@@danf@17-8-2009 10271310@unknown@formal@none@1@S@[[David A. Wheeler]] has published research concluding that free software is quantitatively more reliable than proprietary software.@@@@1@17@@danf@17-8-2009 10271320@unknown@formal@none@1@S@== Adoption ==@@@@1@3@@danf@17-8-2009 10271330@unknown@formal@none@1@S@Free software played a part in the development of the Internet, the World Wide Web and the infrastructure of [[dot-com companies]].@@@@1@21@@danf@17-8-2009 10271340@unknown@formal@none@1@S@Free software allows users to cooperate in enhancing and refining the programs they use; free software is a [[pure public good]] rather than a [[private good]].@@@@1@26@@danf@17-8-2009 10271350@unknown@formal@none@1@S@Companies that contribute to free software can increase commercial [[innovation]] amidst the void of [[patent]] [[cross licensing]] lawsuits.@@@@1@18@@danf@17-8-2009 10271360@unknown@formal@none@1@S@(See [[Mpeg2#Patent holders|mpeg2 patent holders]])@@@@1@5@@danf@17-8-2009 10271370@unknown@formal@none@1@S@Under the free software business model, free software vendors may charge a fee for distribution and offer pay support and software customization services.@@@@1@23@@danf@17-8-2009 10271380@unknown@formal@none@1@S@Proprietary software uses a different business model, where a customer of the proprietary software pays a fee for a license to use the software.@@@@1@24@@danf@17-8-2009 10271390@unknown@formal@none@1@S@This license may grant the customer the ability to configure some or no parts of the software themselves.@@@@1@18@@danf@17-8-2009 10271400@unknown@formal@none@1@S@Often some level of support is included in the purchase of proprietary software, but additional support services (especially for enterprise applications) are usually available for an additional fee.@@@@1@28@@danf@17-8-2009 10271410@unknown@formal@none@1@S@Some proprietary software vendors will also customize software for a fee.@@@@1@11@@danf@17-8-2009 10271420@unknown@formal@none@1@S@Free software is generally available at little to no cost and can result in permanently lower costs compared to [[proprietary software]].@@@@1@21@@danf@17-8-2009 10271430@unknown@formal@none@1@S@With free software, businesses can fit software to their specific needs by changing the software themselves or by hiring programmers to modify it for them.@@@@1@25@@danf@17-8-2009 10271440@unknown@formal@none@1@S@Free software often has no warranty, and more importantly, generally does not assign legal liability to anyone.@@@@1@17@@danf@17-8-2009 10271450@unknown@formal@none@1@S@However, warranties are permitted between any two parties upon the condition of the software and its usage.@@@@1@17@@danf@17-8-2009 10271460@unknown@formal@none@1@S@Such an agreement is made separately from the free software license.@@@@1@11@@danf@17-8-2009 10271470@unknown@formal@none@1@S@== Controversies ==@@@@1@3@@danf@17-8-2009 10271480@unknown@formal@none@1@S@=== Binary blobs ===@@@@1@4@@danf@17-8-2009 10271490@unknown@formal@none@1@S@In 2006, [[OpenBSD]] started the first campaign against the use of [[binary blobs]], in [[kernel (computer science)|kernels]].@@@@1@17@@danf@17-8-2009 10271500@unknown@formal@none@1@S@Blobs are usually freely distributable [[device driver]]s for hardware from vendors that do not reveal driver source code to users or developers.@@@@1@22@@danf@17-8-2009 10271510@unknown@formal@none@1@S@This restricts the users' freedom to effectively modify the software and distribute modified versions.@@@@1@14@@danf@17-8-2009 10271520@unknown@formal@none@1@S@Also, since the blobs are undocumented and may have [[computer bug|bugs]], they pose a security risk to any [[operating system]] whose kernel includes them.@@@@1@24@@danf@17-8-2009 10271530@unknown@formal@none@1@S@The proclaimed aim of the campaign against blobs is to collect hardware documentation that allows developers to write free software drivers for that hardware, ultimately enabling all free operating systems to become or remain blob-free.@@@@1@35@@danf@17-8-2009 10271540@unknown@formal@none@1@S@The issue of binary blobs in the [[Linux kernel]] and other device drivers motivated some developers in Ireland to launch [[gNewSense]], a GNU/Linux distribution with all the binary blobs removed.@@@@1@30@@danf@17-8-2009 10271550@unknown@formal@none@1@S@The project received support from the [[Free Software Foundation]]@@@@1@9@@danf@17-8-2009 10271560@unknown@formal@none@1@S@=== BitKeeper ===@@@@1@3@@danf@17-8-2009 10271570@unknown@formal@none@1@S@[[Larry McVoy]] invited high-profile free software projects to use his proprietary [[versioning system]], [[BitKeeper]], free of charge, in order to attract paying users.@@@@1@23@@danf@17-8-2009 10271580@unknown@formal@none@1@S@In 2002, Linux coordinator [[Linus Torvalds]] decided to use BitKeeper to develop the Linux kernel, a free software project, claiming no free software alternative met his needs.@@@@1@27@@danf@17-8-2009 10271590@unknown@formal@none@1@S@This controversial decision drew criticism from several sources, including the Free Software Foundation's founder Richard Stallman.@@@@1@16@@danf@17-8-2009 10271600@unknown@formal@none@1@S@Following the apparent [[reverse engineering]] of BitKeeper's protocols, McVoy withdrew permission for gratis use by free software projects, leading the Linux kernel community to develop a free software replacement in [[Git (software)|Git]].@@@@1@32@@danf@17-8-2009 10271610@unknown@formal@none@1@S@=== Patent deals ===@@@@1@4@@danf@17-8-2009 10271620@unknown@formal@none@1@S@In November 2006, the [[Microsoft]] and [[Novell]] software corporations announced a controversial partnership involving, among other things, patent protection for some customers of Novell under certain conditions.@@@@1@27@@danf@17-8-2009 10280010@unknown@formal@none@1@S@
Freeware
@@@@1@1@@danf@17-8-2009 10280020@unknown@formal@none@1@S@'''Freeware''' is computer [[software]] that is available for use at no cost or for an optional fee.@@@@1@17@@danf@17-8-2009 10280030@unknown@formal@none@1@S@Freeware is often made available in a binary-only, [[proprietary software|proprietary]] form, thus making it distinct from [[free software]].@@@@1@18@@danf@17-8-2009 10280040@unknown@formal@none@1@S@Proprietary freeware allows authors to contribute something for the benefit of the community, while at the same time allowing them to retain control of the source code and preserve its business potential.@@@@1@32@@danf@17-8-2009 10280050@unknown@formal@none@1@S@Freeware is different from [[shareware]], where the user is obliged to pay (e.g. after some trial period or for additional functionality).@@@@1@21@@danf@17-8-2009 10280060@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10280070@unknown@formal@none@1@S@The term ''freeware'' was coined by [[Andrew Fluegelman]] when he wanted to sell a communications program named [[PC-Talk]] that he had created but for which he did not wish to use traditional methods of distribution because of their cost.@@@@1@39@@danf@17-8-2009 10280080@unknown@formal@none@1@S@Fluegelman actually distributed PC-Talk via a process now referred to as [[shareware]].@@@@1@12@@danf@17-8-2009 10280090@unknown@formal@none@1@S@Current use of the term freeware does not necessarily match the original concept by Andrew Fluegelman.@@@@1@16@@danf@17-8-2009 10280100@unknown@formal@none@1@S@== Criteria ==@@@@1@3@@danf@17-8-2009 10280110@unknown@formal@none@1@S@The only criterion for being classified as freeware is that the software must be fully functional for an unlimited time with no monetary cost.@@@@1@24@@danf@17-8-2009 10280120@unknown@formal@none@1@S@The software license may impose one or more other restrictions on the type of use including personal use, individual use, non-profit use, non-commercial use, academic use, commercial use or any combination of these.@@@@1@33@@danf@17-8-2009 10280130@unknown@formal@none@1@S@For instance, the license may be "free for personal, non-commercial use."@@@@1@11@@danf@17-8-2009 10280140@unknown@formal@none@1@S@Everything created with the freeware programs can be distributed at no cost (for example graphic, documents, or sounds made by user).@@@@1@21@@danf@17-8-2009 10290010@unknown@formal@none@1@S@
French language
@@@@1@2@@danf@17-8-2009 10290020@unknown@formal@none@1@S@'''French''' (''français'', ) is today spoken around the world by 72 to 130 million people as a [[first language|native]] language, and by about 190 to 600 million people as a [[second language|second]] or third language, with significant speakers in 54 countries.@@@@1@41@@danf@17-8-2009 10290030@unknown@formal@none@1@S@Most native speakers of the language live in [[France]], where the language originated.@@@@1@13@@danf@17-8-2009 10290040@unknown@formal@none@1@S@The rest live in [[Canada]], [[Belgium]] and [[Switzerland]].@@@@1@8@@danf@17-8-2009 10290050@unknown@formal@none@1@S@French is a descendant of the [[Latin]] language of the [[Roman Empire]], as are languages such as [[Portuguese language|Portuguese]], [[Spanish language|Spanish]], [[Italian language|Italian]], [[Catalan language|Catalan]] and [[Romanian language|Romanian]].@@@@1@28@@danf@17-8-2009 10290060@unknown@formal@none@1@S@Its development was also influenced by the native [[Celtic languages]] of Roman [[Gaul]] and by the [[Germanic languages|Germanic]] language of the post-Roman [[Frankish]] invaders.@@@@1@24@@danf@17-8-2009 10290070@unknown@formal@none@1@S@It is an [[official language]] in [[List of countries where French is an official language|29 countries]], most of which form what is called in French ''La [[Francophonie]]'', the community of French-speaking nations.@@@@1@32@@danf@17-8-2009 10290080@unknown@formal@none@1@S@It is an official language of all [[United Nations]] agencies and a [[List of international organisations which have French as an official language|large number of international organizations]].@@@@1@27@@danf@17-8-2009 10290090@unknown@formal@none@1@S@According to the [[European Union]], 129 million (26% of the 497,198,740) people in 27 member states speak French, of which 59 million (12%) speak it natively and 69 million (14%) claim to speak it as a second language, which makes it the third most spoken second language in the Union, after English and German respectively.@@@@1@55@@danf@17-8-2009 10290100@unknown@formal@none@1@S@== Geographic distribution==@@@@1@3@@danf@17-8-2009 10290110@unknown@formal@none@1@S@===Europe===@@@@1@1@@danf@17-8-2009 10290120@unknown@formal@none@1@S@====Legal status in France====@@@@1@4@@danf@17-8-2009 10290130@unknown@formal@none@1@S@Per the [[Constitution of France]], French has been the official language since 1992 (although previous legal texts have made it official since 1539, see [[ordinance of Villers-Cotterêts]]).@@@@1@27@@danf@17-8-2009 10290140@unknown@formal@none@1@S@[[France]] mandates the use of French in official government publications, public [[education]] outside of specific cases (though these dispositions are often ignored) and legal [[contract]]s; [[advertisement]]s must bear a translation of foreign words.@@@@1@33@@danf@17-8-2009 10290150@unknown@formal@none@1@S@In addition to French, there are also a variety of regional languages.@@@@1@12@@danf@17-8-2009 10290160@unknown@formal@none@1@S@France has signed the European Charter for Regional Languages but has not ratified it since that would go against the 1958 Constitution.@@@@1@22@@danf@17-8-2009 10290170@unknown@formal@none@1@S@====Switzerland====@@@@1@1@@danf@17-8-2009 10290180@unknown@formal@none@1@S@French is one of the four official languages of [[Switzerland]] (along with [[German language|German]], [[Italian language|Italian]], and [[Romansh language|Romansh]]) and is spoken in the part of Switzerland called ''[[Romandie]]''.@@@@1@29@@danf@17-8-2009 10290190@unknown@formal@none@1@S@French is the native language of about 20% of the Swiss population.@@@@1@12@@danf@17-8-2009 10290200@unknown@formal@none@1@S@====Belgium====@@@@1@1@@danf@17-8-2009 10290210@unknown@formal@none@1@S@In [[Belgium]], French is the official language of [[Wallonia]] (excluding the [[East Cantons]], which are [[German language|German-speaking]]) and one of the two official languages—along with [[Dutch language|Dutch]]—of the [[Brussels-Capital Region]] where it is spoken by the majority of the population, though often not as their primary language.@@@@1@47@@danf@17-8-2009 10290220@unknown@formal@none@1@S@French and German are not official languages nor recognised minority languages in the [[Flemish Region]], although along borders with the Walloon and Brussels-Capital regions, there are a dozen of [[municipalities with language facilities]] for French-speakers; a mirroring situation exists for the Walloon Region with respect to the Dutch and German languages.@@@@1@51@@danf@17-8-2009 10290230@unknown@formal@none@1@S@In total, native French-speakers make up about 40% of the country's population, the remaining 60% speak Dutch, the latter of which 59% claim to speak French as a second language.@@@@1@30@@danf@17-8-2009 10290240@unknown@formal@none@1@S@French is thus known by an estimated 75% of all Belgians, either as a mother tongue, as second, or as third language.@@@@1@22@@danf@17-8-2009 10290250@unknown@formal@none@1@S@====Monaco and Andorra====@@@@1@3@@danf@17-8-2009 10290260@unknown@formal@none@1@S@Although [[Monégasque language|Monégasque]] is the national language of the [[Principality of Monaco]], French is the only official language, and French nationals make up some 47% of the population.@@@@1@28@@danf@17-8-2009 10290270@unknown@formal@none@1@S@[[Catalan language|Catalan]] is the only official language of [[Andorra]]; however, French is commonly used due to the proximity to France.@@@@1@20@@danf@17-8-2009 10290280@unknown@formal@none@1@S@French nationals make up 7% of the population.@@@@1@8@@danf@17-8-2009 10290290@unknown@formal@none@1@S@====Italy====@@@@1@1@@danf@17-8-2009 10290300@unknown@formal@none@1@S@French is also an official language, along with [[Italian language|Italian]], in the province of [[Aosta Valley]], [[Italy]].@@@@1@17@@danf@17-8-2009 10290310@unknown@formal@none@1@S@In addition, a number of [[Franco-Provençal language|Franco-Provençal]] dialects are spoken in the province, although they do not have official recognition.@@@@1@20@@danf@17-8-2009 10290320@unknown@formal@none@1@S@====Luxembourg====@@@@1@1@@danf@17-8-2009 10290330@unknown@formal@none@1@S@French is one of three official languages of [[Luxembourg|the Grand Duchy of Luxembourg ]] ;@@@@1@15@@danf@17-8-2009 10290340@unknown@formal@none@1@S@the other official languages of Luxembourg are@@@@1@7@@danf@17-8-2009 10290350@unknown@formal@none@1@S@*[[German language|German]]@@@@1@2@@danf@17-8-2009 10290360@unknown@formal@none@1@S@*[[Lëtzebuergesch|Luxemburgish]].@@@@1@1@@danf@17-8-2009 10290370@unknown@formal@none@1@S@Luxemburgish is the natively-spoken language of Luxembourg ;@@@@1@8@@danf@17-8-2009 10290380@unknown@formal@none@1@S@Luxembourg's education system is trilingual: the first years of primary school are in Luxembourgish, before changing to German, while secondary school, the language of instruction changes to French.@@@@1@28@@danf@17-8-2009 10290390@unknown@formal@none@1@S@====The Channel Islands====@@@@1@3@@danf@17-8-2009 10290400@unknown@formal@none@1@S@Although [[Jersey]] and [[Guernsey]], the two bailiwicks collectively referred to as the [[Channel Islands]], are separate entities, both use French to some degree, mostly in an administrative capacity.@@@@1@28@@danf@17-8-2009 10290410@unknown@formal@none@1@S@[[Jersey Legal French]] is the standardized variety used in Jersey.@@@@1@10@@danf@17-8-2009 10290420@unknown@formal@none@1@S@===The Americas===@@@@1@2@@danf@17-8-2009 10290430@unknown@formal@none@1@S@====Legal status in Canada====@@@@1@4@@danf@17-8-2009 10290440@unknown@formal@none@1@S@About 7 million [[Canadian]]s are native French-speakers, of whom 6 million live in [[Quebec]], and French is one of [[Canada]]'s two official languages (the other being [[English language|English]]).@@@@1@28@@danf@17-8-2009 10290450@unknown@formal@none@1@S@Various provisions of the [[Canadian Charter of Rights and Freedoms]] deal with Canadians' right to access services in both languages, including the right to a publicly funded education in the minority language of each province, where numbers warrant in a given locality.@@@@1@42@@danf@17-8-2009 10290460@unknown@formal@none@1@S@By [[law]], the federal government must operate and provide services in both English and French, proceedings of the [[Parliament of Canada]] must be translated into both these languages, and most products sold in Canada must have labeling in both languages.@@@@1@40@@danf@17-8-2009 10290470@unknown@formal@none@1@S@Overall, about 13% of Canadians have knowledge of French only, while 18% have knowledge of both English and French.@@@@1@19@@danf@17-8-2009 10290480@unknown@formal@none@1@S@In contrast, over 82% of the population of Quebec speaks French natively, and almost 96% speak it as either their first or second language.@@@@1@24@@danf@17-8-2009 10290490@unknown@formal@none@1@S@It has been the sole official language of Quebec since 1974.@@@@1@11@@danf@17-8-2009 10290500@unknown@formal@none@1@S@The legal status of French was further strengthened with the 1977 adoption of the [[Charter of the French Language]] (popularly known as ''Bill 101''), which guarantees that every person has a right to have the civil administration, the health and social services, corporations, and enterprises in Quebec communicate with him in French.@@@@1@52@@danf@17-8-2009 10290510@unknown@formal@none@1@S@While the Charter mandates that certain provincial government services, such as those relating to health and education, be offered to the English minority in its language, where numbers warrant, its primary purpose is to cement the role of French as the primary language used in the public sphere.@@@@1@48@@danf@17-8-2009 10290520@unknown@formal@none@1@S@[[Image:Knowledge French EU map.png|right|thumb|240px|Knowledge of French in the European Union and candidate countries]] The provision of the Charter that has arguably had the most significant impact mandates French-language [[education]] unless a child's parents or siblings have received the majority of their own primary education in English within Canada, with minor exceptions.@@@@1@51@@danf@17-8-2009 10290530@unknown@formal@none@1@S@This measure has reversed a historical trend whereby a large number of immigrant children would attend English schools.@@@@1@18@@danf@17-8-2009 10290540@unknown@formal@none@1@S@In so doing, the Charter has greatly contributed to the "visage français" (French face) of Montreal in spite of its growing immigrant population.@@@@1@23@@danf@17-8-2009 10290550@unknown@formal@none@1@S@Other provisions of the Charter have been ruled unconstitutional over the years, including those mandating French-only commercial signs, court proceedings, and debates in the legislature.@@@@1@25@@danf@17-8-2009 10290560@unknown@formal@none@1@S@Though none of these provisions are still in effect today, some continued to be on the books for a time even after courts had ruled them unconstitutional as a result of the government's decision to invoke the so-called [[Section Thirty-three of the Canadian Charter of Rights and Freedoms|notwithstanding clause]] of the Canadian constitution to override constitutional requirements.@@@@1@57@@danf@17-8-2009 10290570@unknown@formal@none@1@S@In 1993, the Charter was rewritten to allow signage in other languages so long as French was markedly "predominant."@@@@1@19@@danf@17-8-2009 10290580@unknown@formal@none@1@S@Another section of the Charter guarantees every person the right to work in French, meaning the right to have all communications with one's superiors and coworkers in French, as well as the right not to be required to know another language as a condition of hiring, unless this is warranted by the nature of one's duties, such as by reason of extensive interaction with people located outside the province or similar reasons.@@@@1@72@@danf@17-8-2009 10290590@unknown@formal@none@1@S@This section has not been as effective as had originally been hoped, and has faded somewhat from public consciousness.@@@@1@19@@danf@17-8-2009 10290600@unknown@formal@none@1@S@As of 2006, approximately 65% of the workforce on the island of Montreal predominantly used French in the workplace.@@@@1@19@@danf@17-8-2009 10290610@unknown@formal@none@1@S@The only other province that recognizes French as an official language is [[New Brunswick]], which is officially bilingual, like the nation as a whole.@@@@1@24@@danf@17-8-2009 10290620@unknown@formal@none@1@S@Outside of [[Quebec]], the highest number of Francophones in Canada, 485,000, excluding those who claim multiple mother tongues, reside in [[Ontario]], whereas [[New Brunswick]], home to the vast majority of [[Acadians]], has the highest ''percentage'' of Francophones after [[Quebec]], 33%, or 237,000.@@@@1@42@@danf@17-8-2009 10290630@unknown@formal@none@1@S@In [[Ontario]], [[Nova Scotia]], [[Prince Edward Island]], and [[Manitoba]], French does not have full official status, although the provincial governments do provide some French-language services in all communities where significant numbers of Francophones live.@@@@1@34@@danf@17-8-2009 10290640@unknown@formal@none@1@S@Canada's three northern territories ([[Yukon]], [[Northwest Territories]], and [[Nunavut]]) all recognize French as an official language as well.@@@@1@18@@danf@17-8-2009 10290650@unknown@formal@none@1@S@All provinces make some effort to accommodate the needs of their Francophone [[citizen]]s, although the level and quality of French-language service vary significantly from province to province.@@@@1@27@@danf@17-8-2009 10290660@unknown@formal@none@1@S@The Ontario [[French Language Services Act]], adopted in 1986, guarantees French language services in that province in regions where the Francophone population exceeds 10% of the total population, as well as communities with Francophone populations exceeding 5,000, and certain other designated areas; this has the most effect in the north and east of the province, as well as in other larger centres such as [[Ottawa]], [[Toronto]], [[Hamilton, Ontario|Hamilton]], [[Mississauga, Ontario|Mississauga]], [[London, Ontario|London]], [[Kitchener, Ontario|Kitchener]], [[St. Catharines, Ontario|St. Catharines]], [[Greater Sudbury]] and [[Windsor, Ontario|Windsor]].@@@@1@83@@danf@17-8-2009 10290670@unknown@formal@none@1@S@However, the French Language Services Act does not confer the status of "official bilingualism" on these cities, as that designation carries with it implications which go beyond the provision of services in both languages.@@@@1@34@@danf@17-8-2009 10290680@unknown@formal@none@1@S@The City of Ottawa's language policy (by-law 2001-170) allows employees to work in their official language of choice and be supervised in the language of choice.@@@@1@26@@danf@17-8-2009 10290690@unknown@formal@none@1@S@Canada has the status of member state in the Francophonie, while the provinces of Quebec and New Brunswick are recognized as participating governments.@@@@1@23@@danf@17-8-2009 10290700@unknown@formal@none@1@S@Ontario is currently seeking to become a full member on its own.@@@@1@12@@danf@17-8-2009 10290710@unknown@formal@none@1@S@====Haiti====@@@@1@1@@danf@17-8-2009 10290720@unknown@formal@none@1@S@French is an official language of [[Haiti]], although it is mostly spoken by the [[upper class]], while [[Haitian Creole]] (a [[French-based creole language]]) is more widely spoken as a [[mother tongue]].@@@@1@31@@danf@17-8-2009 10290730@unknown@formal@none@1@S@====French overseas territories====@@@@1@3@@danf@17-8-2009 10290740@unknown@formal@none@1@S@French is also the official language in France's overseas territories of [[French Guiana]], [[Guadeloupe]], [[Martinique]], [[Saint Barthélemy]], [[Saint Martin (France)|St. Martin]] and [[Saint-Pierre and Miquelon]].@@@@1@25@@danf@17-8-2009 10290750@unknown@formal@none@1@S@====The United States====@@@@1@3@@danf@17-8-2009 10290760@unknown@formal@none@1@S@Although it has no official recognition on a federal level, French is the third most-spoken language in the United States, after [[English language|English]] and [[Spanish language|Spanish]], and the second most-spoken in the states of [[Louisiana]], [[Maine]], [[Vermont]] and [[New Hampshire]].@@@@1@40@@danf@17-8-2009 10290770@unknown@formal@none@1@S@Louisiana is home to two distinct dialects, [[Cajun French]] and [[Louisiana Creole French|Creole French]]@@@@1@14@@danf@17-8-2009 10290780@unknown@formal@none@1@S@===Africa===@@@@1@1@@danf@17-8-2009 10290790@unknown@formal@none@1@S@A majority of the world's French-speaking population lives in Africa.@@@@1@10@@danf@17-8-2009 10290800@unknown@formal@none@1@S@According to the 2007 report by the Organisation internationale de la Francophonie, an estimated 115 million African people spread across 31 francophone African countries can speak French either as a [[first language|first]] or [[second language]].@@@@1@35@@danf@17-8-2009 10290810@unknown@formal@none@1@S@French is mostly a second language in Africa, but in some areas it has become a first language, such as in the region of [[Abidjan]], [[Côte d'Ivoire]] and in [[Libreville]], [[Gabon]].@@@@1@31@@danf@17-8-2009 10290820@unknown@formal@none@1@S@It is impossible to speak of a single form of [[African French]], but rather of diverse forms of African French which have developed due to the contact with many indigenous [[African languages]].@@@@1@32@@danf@17-8-2009 10290830@unknown@formal@none@1@S@In the territories of the [[Indian Ocean]], the French language is often spoken alongside French-derived creole languages, the major exception being [[Madagascar]].@@@@1@22@@danf@17-8-2009 10290840@unknown@formal@none@1@S@There, a Malayo-Polynesian language ([[Malagasy]]) is spoken alongside French.@@@@1@9@@danf@17-8-2009 10290850@unknown@formal@none@1@S@The French language has also met competition with English since English has been the official language in [[Mauritius]] and the [[Seychelles]] for a long time and has recently become an official language of Madagascar.@@@@1@34@@danf@17-8-2009 10290860@unknown@formal@none@1@S@[[Sub-Saharan Africa]] is the region where the French language is most likely to expand due to the expansion of education and it is also there the language has evolved most in recent years.@@@@1@33@@danf@17-8-2009 10290870@unknown@formal@none@1@S@Some vernacular forms of French in Africa can be difficult to understand for French speakers from other countries but written forms of the language are very closely related to those of the rest of the French-speaking world.@@@@1@37@@danf@17-8-2009 10290880@unknown@formal@none@1@S@French is an official language of many African countries, most of them former French or [[Belgian colonial empire|Belgian colonies]]:@@@@1@19@@danf@17-8-2009 10290890@unknown@formal@none@1@S@:*[[Benin]]@@@@1@1@@danf@17-8-2009 10290900@unknown@formal@none@1@S@:*[[Burkina Faso]]@@@@1@2@@danf@17-8-2009 10290910@unknown@formal@none@1@S@:*[[Burundi]]@@@@1@1@@danf@17-8-2009 10290920@unknown@formal@none@1@S@:*[[Cameroon]]@@@@1@1@@danf@17-8-2009 10290930@unknown@formal@none@1@S@:*[[Central African Republic]]@@@@1@3@@danf@17-8-2009 10290940@unknown@formal@none@1@S@:*[[Chad]]@@@@1@1@@danf@17-8-2009 10290950@unknown@formal@none@1@S@:*[[Comoros]]@@@@1@1@@danf@17-8-2009 10290960@unknown@formal@none@1@S@:*[[Congo (Brazzaville)]]@@@@1@2@@danf@17-8-2009 10290970@unknown@formal@none@1@S@:*[[Côte d'Ivoire]]@@@@1@2@@danf@17-8-2009 10290980@unknown@formal@none@1@S@:*[[Democratic Republic of the Congo]]@@@@1@5@@danf@17-8-2009 10290990@unknown@formal@none@1@S@:*[[Djibouti]]@@@@1@1@@danf@17-8-2009 10291000@unknown@formal@none@1@S@:*[[Equatorial Guinea]] (former colony of [[Spain]])@@@@1@6@@danf@17-8-2009 10291010@unknown@formal@none@1@S@:*[[Gabon]]@@@@1@1@@danf@17-8-2009 10291020@unknown@formal@none@1@S@:*[[Guinea]]@@@@1@1@@danf@17-8-2009 10291030@unknown@formal@none@1@S@:*[[Madagascar]]@@@@1@1@@danf@17-8-2009 10291040@unknown@formal@none@1@S@:*[[Mali]]@@@@1@1@@danf@17-8-2009 10291050@unknown@formal@none@1@S@:*[[Niger]]@@@@1@1@@danf@17-8-2009 10291060@unknown@formal@none@1@S@:*[[Rwanda]]@@@@1@1@@danf@17-8-2009 10291070@unknown@formal@none@1@S@:*[[Senegal]]@@@@1@1@@danf@17-8-2009 10291080@unknown@formal@none@1@S@:*[[Seychelles]]@@@@1@1@@danf@17-8-2009 10291090@unknown@formal@none@1@S@:*[[Togo]]@@@@1@1@@danf@17-8-2009 10291100@unknown@formal@none@1@S@In addition, French is an administrative language and commonly used though not on an official basis in [[Mauritius]] and in the [[Maghreb]] states:@@@@1@23@@danf@17-8-2009 10291110@unknown@formal@none@1@S@:* [[Mauritania]]@@@@1@2@@danf@17-8-2009 10291120@unknown@formal@none@1@S@:* [[Algeria]]@@@@1@2@@danf@17-8-2009 10291130@unknown@formal@none@1@S@:*[[Morocco]]@@@@1@1@@danf@17-8-2009 10291140@unknown@formal@none@1@S@:*[[Tunisia]].@@@@1@1@@danf@17-8-2009 10291150@unknown@formal@none@1@S@Various reforms have been implemented in recent decades in Algeria to improve the status of [[Arabic language|Arabic]] relative to French, especially in education.@@@@1@23@@danf@17-8-2009 10291160@unknown@formal@none@1@S@While the predominant European language in [[Egypt]] is [[English language|English]], French is considered to be a more sophisticated language by some elements of the Egyptian upper and upper-middle classes; for this reason, a typical educated Egyptian will learn French in addition to English at some point in his or her education.@@@@1@51@@danf@17-8-2009 10291170@unknown@formal@none@1@S@The perception of sophistication may be related to the use of French as the [[Noble court|royal court]] language of Egypt during the nineteenth century.@@@@1@24@@danf@17-8-2009 10291180@unknown@formal@none@1@S@Egypt participates in [[La Francophonie]].@@@@1@5@@danf@17-8-2009 10291190@unknown@formal@none@1@S@French is also the official language of [[Mayotte]] and [[Réunion]], two [[Overseas departments and territories of France|overseas territories]] of France located in the [[Indian Ocean]], as well as an administrative and educational language in [[Mauritius]], along with [[English language|English]].@@@@1@39@@danf@17-8-2009 10291200@unknown@formal@none@1@S@===Asia===@@@@1@1@@danf@17-8-2009 10291210@unknown@formal@none@1@S@====Lebanon ====@@@@1@2@@danf@17-8-2009 10291220@unknown@formal@none@1@S@French was the official language in [[Lebanon]] along with [[Arabic language|Arabic]] until 1941, the country's declaration of independence from [[France]].@@@@1@20@@danf@17-8-2009 10291230@unknown@formal@none@1@S@French is still seen as an official language by the [[Lebanese people]] as it is widely used by the Lebanese, especially for administrative purposes, and is taught in schools as a primary language along with [[Arabic]].@@@@1@36@@danf@17-8-2009 10291240@unknown@formal@none@1@S@====Southeast Asia====@@@@1@2@@danf@17-8-2009 10291250@unknown@formal@none@1@S@French is an administrative language in [[Laos]] and [[Cambodia]].@@@@1@9@@danf@17-8-2009 10291260@unknown@formal@none@1@S@French was historically spoken by the elite in the leased territory [[Guangzhouwan]] in southern [[China]].@@@@1@15@@danf@17-8-2009 10291270@unknown@formal@none@1@S@In colonial [[Vietnam]], the elites spoke French and many who worked for the French spoke a French creole known as "[[Tây Bồi]]" (now extinct).@@@@1@24@@danf@17-8-2009 10291280@unknown@formal@none@1@S@====India====@@@@1@1@@danf@17-8-2009 10291290@unknown@formal@none@1@S@French has official status in the Indian [[Union Territory]] of [[Puducherry|Pondicherry]], along with the regional language [[Tamil language|Tamil]] and some students of Tamil Nadu may opt French as their third or fourth language (usually behind [[English language|English]], Tamil, [[Hindi]]).@@@@1@39@@danf@17-8-2009 10291300@unknown@formal@none@1@S@French is also commonly taught as third language in secondary school in most cities of [[Maharashtra]] State including [[Mumbai]] as part of the Secondary (X-SSC) and Higher secondary School (XII-HSC) certificate examinations.@@@@1@32@@danf@17-8-2009 10291310@unknown@formal@none@1@S@===Oceania===@@@@1@1@@danf@17-8-2009 10291320@unknown@formal@none@1@S@French is also a second official language of the [[Pacific Island]] nation of [[Vanuatu]], along with France's territories of [[French Polynesia]], [[Wallis & Futuna]] and [[New Caledonia]].@@@@1@27@@danf@17-8-2009 10291330@unknown@formal@none@1@S@==Dialects==@@@@1@1@@danf@17-8-2009 10291340@unknown@formal@none@1@S@*[[Acadian French]]@@@@1@2@@danf@17-8-2009 10291350@unknown@formal@none@1@S@*[[African French]]@@@@1@2@@danf@17-8-2009 10291360@unknown@formal@none@1@S@*[[Aostan French]]@@@@1@2@@danf@17-8-2009 10291370@unknown@formal@none@1@S@*[[Belgian French]]@@@@1@2@@danf@17-8-2009 10291380@unknown@formal@none@1@S@*[[Cajun French]]@@@@1@2@@danf@17-8-2009 10291390@unknown@formal@none@1@S@*[[Canadian French]]@@@@1@2@@danf@17-8-2009 10291400@unknown@formal@none@1@S@*[[Cambodian French]]@@@@1@2@@danf@17-8-2009 10291410@unknown@formal@none@1@S@*Guyana French (see [[French Guiana]])@@@@1@5@@danf@17-8-2009 10291420@unknown@formal@none@1@S@*[[Indian French]]@@@@1@2@@danf@17-8-2009 10291430@unknown@formal@none@1@S@*[[Jersey Legal French]]@@@@1@3@@danf@17-8-2009 10291440@unknown@formal@none@1@S@*[[Lao French]]@@@@1@2@@danf@17-8-2009 10291450@unknown@formal@none@1@S@*[[Levantine French]] (most commonly referred to as Lebanese French, very similar to [[Maghreb French]])@@@@1@14@@danf@17-8-2009 10291460@unknown@formal@none@1@S@*[[Louisiana Creole French]]@@@@1@3@@danf@17-8-2009 10291470@unknown@formal@none@1@S@*[[Maghreb French]] (see also North African French)@@@@1@7@@danf@17-8-2009 10291480@unknown@formal@none@1@S@*[[Meridional French]]@@@@1@2@@danf@17-8-2009 10291490@unknown@formal@none@1@S@*[[Metropolitan France|Metropolitan French]]@@@@1@3@@danf@17-8-2009 10291500@unknown@formal@none@1@S@*[[Caldoche|New Caledonian French]]@@@@1@3@@danf@17-8-2009 10291510@unknown@formal@none@1@S@*[[Newfoundland French]]@@@@1@2@@danf@17-8-2009 10291520@unknown@formal@none@1@S@*Oceanic French@@@@1@2@@danf@17-8-2009 10291530@unknown@formal@none@1@S@*[[Quebec French]]@@@@1@2@@danf@17-8-2009 10291540@unknown@formal@none@1@S@*[[South East Asian French]]@@@@1@4@@danf@17-8-2009 10291550@unknown@formal@none@1@S@*[[Swiss French]]@@@@1@2@@danf@17-8-2009 10291560@unknown@formal@none@1@S@*[[Vietnamese French (dialect)|Vietnamese French]]@@@@1@4@@danf@17-8-2009 10291570@unknown@formal@none@1@S@*West Indian French@@@@1@3@@danf@17-8-2009 10291580@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10291590@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10291600@unknown@formal@none@1@S@{{IPA notice}}@@@@1@2@@danf@17-8-2009 10291610@unknown@formal@none@1@S@Although there are many French regional accents, only one version of the language is normally chosen as a model for foreign learners, which has no commonly used special name, but has been termed ''[[français neutre]]'' (neutral French).@@@@1@37@@danf@17-8-2009 10291620@unknown@formal@none@1@S@* Voiced stops (i.e. {{IPA|/b d g/}}) are typically produced fully voiced throughout.@@@@1@13@@danf@17-8-2009 10291630@unknown@formal@none@1@S@* Voiceless stops (i.e. {{IPA|/p t k/}}) are unaspirated.@@@@1@9@@danf@17-8-2009 10291640@unknown@formal@none@1@S@* Nasals: The velar nasal {{IPA|/ŋ/}} occurs only in final position in borrowed (usually English) words: parking, camping, swing.@@@@1@19@@danf@17-8-2009 10291650@unknown@formal@none@1@S@The palatal nasal {{IPA|/ɲ/}}can occur in word initial position (e.g. gnon), but it is most frequently found in intervocalic, onset position or word-finally (e.g. montagne).@@@@1@25@@danf@17-8-2009 10291660@unknown@formal@none@1@S@* Fricatives: French has three pairs of homorganic fricatives distinguished by voicing, i.e. labiodental {{IPA|/f/–/v/}}, dental {{IPA|/s/–/z/}}, and palato-alveolar {{IPA|/ʃ/–/ʒ/}}.@@@@1@20@@danf@17-8-2009 10291670@unknown@formal@none@1@S@Notice that {{IPA|/s/–/z/}} are dental, like the plosives {{IPA|/t/–/d/}}, and the nasal {{IPA|/n/}}.@@@@1@13@@danf@17-8-2009 10291680@unknown@formal@none@1@S@* French has one rhotic whose pronunciation varies considerably among speakers and phonetic contexts.@@@@1@14@@danf@17-8-2009 10291690@unknown@formal@none@1@S@In general it is described as a voiced uvular fricative as in {{IPA|[ʁu]}} roue "wheel" .@@@@1@16@@danf@17-8-2009 10291700@unknown@formal@none@1@S@Vowels are often lengthened before this segment.@@@@1@7@@danf@17-8-2009 10291710@unknown@formal@none@1@S@It can be reduced to an approximant, particularly in final position (e.g. "fort") or reduced to zero in some word-final positions.@@@@1@21@@danf@17-8-2009 10291720@unknown@formal@none@1@S@For other speakers, a uvular trill is also fairly common, and an apical trill {{IPA|[r]}} occurs in some dialects.@@@@1@19@@danf@17-8-2009 10291730@unknown@formal@none@1@S@* Lateral and central approximants: The lateral approximant {{IPA|/l/}} is unvelarised in both onset (''lire'') and coda position (''il'').@@@@1@19@@danf@17-8-2009 10291740@unknown@formal@none@1@S@In the onset, the central approximants {{IPA|[w]}}, {{IPA|[ɥ]}}, and {{IPA|[j]}} each correspond to a high vowel, {{IPA|/u/}}, {{IPA|/y/}}, and {{IPA|/i/}} respectively.@@@@1@21@@danf@17-8-2009 10291750@unknown@formal@none@1@S@There are a few minimal pairs where the approximant and corresponding vowel contrast, but there are also many cases where they are in free variation.@@@@1@25@@danf@17-8-2009 10291760@unknown@formal@none@1@S@Contrasts between {{IPA|/j/}} and {{IPA|/i/}} occur in final position as in {{IPA|/pɛj/}} ''paye'' "pay" vs. {{IPA|/pɛi/}} ''pays'' "country".@@@@1@18@@danf@17-8-2009 10291770@unknown@formal@none@1@S@French pronunciation follows strict rules based on spelling, but French spelling is often based more on history than phonology.@@@@1@19@@danf@17-8-2009 10291780@unknown@formal@none@1@S@The rules for pronunciation vary between dialects, but the standard rules are:@@@@1@12@@danf@17-8-2009 10291790@unknown@formal@none@1@S@* final consonants: Final single consonants, in particular ''s'', ''x'', ''z'', ''t'', ''d'', ''n'' and ''m'', are normally silent.@@@@1@19@@danf@17-8-2009 10291800@unknown@formal@none@1@S@(The final letters ''c'', ''r'', ''f'' and ''l'', however, are normally pronounced.)@@@@1@12@@danf@17-8-2009 10291810@unknown@formal@none@1@S@**When the following word begins with a vowel, though, a silent consonant ''may'' once again be pronounced, to provide a ''[[liaison (linguistics)|liaison]]'' or "link" between the two words.@@@@1@28@@danf@17-8-2009 10291820@unknown@formal@none@1@S@Some liaisons are ''mandatory'', for example the ''s'' in ''les amants'' or ''vous avez''; some are ''optional'', depending on [[dialect]] and [[register (linguistics)|register]], for example the first ''s'' in ''deux cents euros'' or ''euros irlandais''; and some are ''forbidden'', for example the ''s'' in ''beaucoup d'hommes aiment''.@@@@1@47@@danf@17-8-2009 10291830@unknown@formal@none@1@S@The ''t'' of ''et'' is never pronounced and the silent final consonant of a noun is only pronounced in the plural and in [[set phrase]]s like ''pied-à-terre''.@@@@1@27@@danf@17-8-2009 10291840@unknown@formal@none@1@S@Note that in the case of a word ending ''d'' as in ''pied-à-terre'', the consonant ''t'' is pronounced instead.@@@@1@19@@danf@17-8-2009 10291850@unknown@formal@none@1@S@** Doubling a final ''n'' and adding a silent ''e'' at the end of a word (e.g. ''chien'' → ''chienne'') makes it clearly pronounced.@@@@1@24@@danf@17-8-2009 10291860@unknown@formal@none@1@S@Doubling a final ''l'' and adding a silent ''e'' (e.g. ''gentil'' → ''gentille'') adds a [j] sound.@@@@1@17@@danf@17-8-2009 10291870@unknown@formal@none@1@S@* [[elision (French)|elision]] or vowel dropping: Some monosyllabic function words ending in ''a'' or ''e'', such as ''je'' and ''que'', drop their final vowel when placed before a word that begins with a vowel sound (thus avoiding a [[hiatus (linguistics)|hiatus]]).@@@@1@40@@danf@17-8-2009 10291880@unknown@formal@none@1@S@The missing vowel is replaced by an apostrophe. (e.g. ''je ai'' is instead pronounced and spelt → ''j'ai'').@@@@1@18@@danf@17-8-2009 10291890@unknown@formal@none@1@S@This gives for example the same pronunciation for ''l'homme qu'il a vu'' ("the man whom he saw") and ''l'homme qui l'a vu'' ("the man who saw him").@@@@1@27@@danf@17-8-2009 10291900@unknown@formal@none@1@S@==Orthography==@@@@1@1@@danf@17-8-2009 10291910@unknown@formal@none@1@S@* [[Nasal vowel|Nasal]]: ''[[n]]'' and ''[[m]]''.@@@@1@6@@danf@17-8-2009 10291920@unknown@formal@none@1@S@When ''n'' or ''m'' follows a vowel or diphthong, the ''n'' or ''m'' becomes silent and causes the preceding vowel to become nasalized (i.e. pronounced with the soft palate extended downward so as to allow part of the air to leave through the nostrils).@@@@1@44@@danf@17-8-2009 10291930@unknown@formal@none@1@S@Exceptions are when the ''n'' or ''m'' is doubled, or immediately followed by a vowel.@@@@1@15@@danf@17-8-2009 10291940@unknown@formal@none@1@S@The prefixes ''en-'' and ''em-'' are always nasalized.@@@@1@8@@danf@17-8-2009 10291950@unknown@formal@none@1@S@The rules get more complex than this but may vary between dialects.@@@@1@12@@danf@17-8-2009 10291960@unknown@formal@none@1@S@* [[digraph (orthography)|Digraphs]]: French does not introduce extra letters or [[diacritic]]s to specify its large range of vowel sounds and [[diphthongs]], rather it uses specific combinations of vowels, sometimes with following consonants, to show which sound is intended.@@@@1@38@@danf@17-8-2009 10291970@unknown@formal@none@1@S@* [[Consonant length|Gemination]]: Within words, double consonants are generally not pronounced as geminates in modern French (but geminates can be heard in the cinema or TV news from as recently as the 1970s, and in very refined elocution they may still occur).@@@@1@42@@danf@17-8-2009 10291980@unknown@formal@none@1@S@For example, ''illusion'' is pronounced {{IPA|[ilyzjɔ̃]}} and not {{IPA|[illyzjɔ̃]}}.@@@@1@9@@danf@17-8-2009 10291990@unknown@formal@none@1@S@But gemination does occur between words.@@@@1@6@@danf@17-8-2009 10292000@unknown@formal@none@1@S@For example, ''une info'' ("a news") is pronounced {{IPA|[ynɛ̃fo]}}, whereas ''une nympho'' ("a nympho") is pronounced {{IPA|[ynnɛ̃fo]}}.@@@@1@17@@danf@17-8-2009 10292010@unknown@formal@none@1@S@* [[Diacritic|Accents]] are used sometimes for pronunciation, sometimes to distinguish similar words, and sometimes for etymology alone.@@@@1@17@@danf@17-8-2009 10292020@unknown@formal@none@1@S@**Accents that affect pronunciation@@@@1@4@@danf@17-8-2009 10292030@unknown@formal@none@1@S@***The [[acute accent]] (''l'accent aigu''), ''é'' (e.g. ''école''—school), means that the vowel is pronounced {{IPA|/e/}} instead of the default {{IPA|/ə/}}.@@@@1@20@@danf@17-8-2009 10292040@unknown@formal@none@1@S@***The [[grave accent]] (''l'accent grave''), ''è'' (e.g. ''élève''—pupil) means that the vowel is pronounced {{IPA|/ɛ/}} instead of the default {{IPA|/ə/}}.@@@@1@20@@danf@17-8-2009 10292050@unknown@formal@none@1@S@***The [[circumflex]] (''l'accent circonflexe'') ''ê'' (e.g. ''forêt''—forest) shows that an ''e'' is pronounced {{IPA|/ɛ/}} and that an ''o'' is pronounced {{IPA|/o/}}.@@@@1@21@@danf@17-8-2009 10292060@unknown@formal@none@1@S@In standard French it also signifies a pronunciation of {{IPA|/ɑ/}} for the letter ''a'', but this differentiation is disappearing.@@@@1@19@@danf@17-8-2009 10292070@unknown@formal@none@1@S@In the late 19th century, the circumflex was used in place of ''s'' where that letter was not to be pronounced.@@@@1@21@@danf@17-8-2009 10292080@unknown@formal@none@1@S@Thus, ''forest'' became ''forêt'' and ''hospital'' became'' hôpital''.@@@@1@8@@danf@17-8-2009 10292090@unknown@formal@none@1@S@***The [[Umlaut (diacritic)|diaeresis]] (''le tréma'') (e.g. ''naïf''—foolish, ''Noël''—Christmas) as in English, specifies that this vowel is pronounced separately from the preceding one, not combined and is not a [[schwa]].@@@@1@29@@danf@17-8-2009 10292100@unknown@formal@none@1@S@***The [[cedilla]] (''la cédille'') ''ç'' (e.g. ''garçon''—boy) means that the letter ''c'' is pronounced {{IPA|/s/}} in front of the hard vowels ''a'', ''o'' and ''u'' (''c'' is otherwise {{IPA|/k/}} before a hard vowel).@@@@1@33@@danf@17-8-2009 10292110@unknown@formal@none@1@S@''C'' is always pronounced {{IPA|/s/}} in front of the soft vowels ''e'', ''i'', and ''y'', thus ''ç'' is never found in front of soft vowels.@@@@1@25@@danf@17-8-2009 10292120@unknown@formal@none@1@S@**Accents with no pronunciation effect@@@@1@5@@danf@17-8-2009 10292130@unknown@formal@none@1@S@***The circumflex does not affect the pronunciation of the letters ''i'' or ''u'', and in most dialects, ''a'' as well.@@@@1@20@@danf@17-8-2009 10292140@unknown@formal@none@1@S@It usually indicates that an ''s'' came after it long ago, as in ''hôtel''.@@@@1@14@@danf@17-8-2009 10292150@unknown@formal@none@1@S@***All other accents are used only to distinguish similar words, as in the case of distinguishing the adverbs ''là'' and ''où'' ("there", "where") from the article ''la'' and the conjunction ''ou'' ("the" fem. sing., "or") respectively.@@@@1@36@@danf@17-8-2009 10292160@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10292170@unknown@formal@none@1@S@French grammar shares several notable features with most other Romance languages, including:@@@@1@12@@danf@17-8-2009 10292180@unknown@formal@none@1@S@* the loss of Latin's [[declension]]s@@@@1@6@@danf@17-8-2009 10292190@unknown@formal@none@1@S@* only two [[grammatical gender]]s@@@@1@5@@danf@17-8-2009 10292200@unknown@formal@none@1@S@* the development of grammatical [[article (grammar)|article]]s from Latin [[demonstrative]]s@@@@1@10@@danf@17-8-2009 10292210@unknown@formal@none@1@S@* new [[tense]]s formed from auxiliaries@@@@1@6@@danf@17-8-2009 10292220@unknown@formal@none@1@S@French word order is [[Subject Verb Object]], except when the object is a pronoun, in which case the word order is [[Subject Object Verb]].@@@@1@24@@danf@17-8-2009 10292230@unknown@formal@none@1@S@Some rare archaisms allow for different word orders.@@@@1@8@@danf@17-8-2009 10292240@unknown@formal@none@1@S@==Vocabulary==@@@@1@1@@danf@17-8-2009 10292250@unknown@formal@none@1@S@The majority of French words derive from [[Vulgar Latin]] or were constructed from Latin or Greek roots.@@@@1@17@@danf@17-8-2009 10292260@unknown@formal@none@1@S@There are often pairs of words, one form being "popular" (noun) and the other one "savant" (adjective), both originating from Latin.@@@@1@21@@danf@17-8-2009 10292270@unknown@formal@none@1@S@Example:@@@@1@1@@danf@17-8-2009 10292280@unknown@formal@none@1@S@* brother: ''frère'' / ''fraternel'' < from Latin ''frater''@@@@1@9@@danf@17-8-2009 10292290@unknown@formal@none@1@S@* finger: ''doigt'' / ''digital'' < from Latin ''digitus''@@@@1@9@@danf@17-8-2009 10292300@unknown@formal@none@1@S@* faith: ''foi'' / ''fidèle'' < from Latin ''fides''@@@@1@9@@danf@17-8-2009 10292310@unknown@formal@none@1@S@* cold: ''froid'' / ''frigide'' < from Latin ''frigidus''@@@@1@9@@danf@17-8-2009 10292320@unknown@formal@none@1@S@* eye: ''œil'' / ''oculaire'' < from Latin ''oculus''@@@@1@9@@danf@17-8-2009 10292330@unknown@formal@none@1@S@In some examples there is a common word from Vulgar Latin and a more savant word borrowed directly from [[Medieval Latin]] or even [[Ancient Greek]].@@@@1@25@@danf@17-8-2009 10292340@unknown@formal@none@1@S@* '''Cheval'''—Concours '''équestre'''—'''Hippo'''drome@@@@1@3@@danf@17-8-2009 10292350@unknown@formal@none@1@S@The French words which have developed from Latin are usually less recognisable than [[Italian language|Italian]] words of Latin origin because as French evolved from [[Vulgar Latin]], the unstressed final [[syllable]] of many words was dropped or elided into the following word.@@@@1@41@@danf@17-8-2009 10292360@unknown@formal@none@1@S@It is estimated that 12% (4,200) of common French words found in a typical [[dictionary]] such as the ''Petit Larousse'' or ''Micro-Robert Plus'' (35,000 words) are of foreign origin.@@@@1@29@@danf@17-8-2009 10292370@unknown@formal@none@1@S@About 25% (1,054) of these foreign words come from [[English language|English]] and are fairly recent borrowings.@@@@1@16@@danf@17-8-2009 10292380@unknown@formal@none@1@S@The others are some 707 words from [[Italian language|Italian]], 550 from ancient [[Germanic languages]], 481 from ancient [[Gallo-Romance languages]], 215 from [[Arabic language|Arabic]], 164 from [[German language|German]], 160 from [[Celtic languages]], 159 from [[Spanish language|Spanish]], 153 from [[Dutch language|Dutch]], 112 from [[Persian language|Persian]] and [[Sanskrit language|Sanskrit]], 101 from [[Native American languages]], 89 from other [[Asian languages]], 56 from other [[Afro-Asiatic languages]], 55 from [[Slavic languages]] and [[Baltic languages]], 10 for [[Basque language|Basque]] and 144 — about three percent — from other languages.@@@@1@82@@danf@17-8-2009 10292390@unknown@formal@none@1@S@===Numerals===@@@@1@1@@danf@17-8-2009 10292400@unknown@formal@none@1@S@The French counting system is partially [[vigesimal]]: [[20 (number)|twenty]] (''{{lang|fr|vingt}}'') is used as a base number in the names of numbers from 60–99.@@@@1@23@@danf@17-8-2009 10292410@unknown@formal@none@1@S@The French word for ''eighty'', for example, is ''{{lang|fr|quatre-vingts}}'', which literally means "four twenties", and ''{{lang|fr|soixante-quinze}}'' (literally "sixty-fifteen") means 75.@@@@1@20@@danf@17-8-2009 10292420@unknown@formal@none@1@S@This reform arose after the [[French Revolution]] to unify the different counting system (mostly vigesimal near the coast, due to Celtic (via [[Basque language|Basque]]) and Viking influence).@@@@1@27@@danf@17-8-2009 10292430@unknown@formal@none@1@S@This system is comparable to the archaic English use of ''score'', as in "fourscore and seven" (87), or "threescore and ten" (70).@@@@1@22@@danf@17-8-2009 10292440@unknown@formal@none@1@S@[[Belgian French]] and [[Swiss French]] are different in this respect.@@@@1@10@@danf@17-8-2009 10292450@unknown@formal@none@1@S@In Belgium and Switzerland 70 and 90 are ''{{lang|fr|septante}}'' and ''{{lang|fr|nonante}}''.@@@@1@11@@danf@17-8-2009 10292460@unknown@formal@none@1@S@In Switzerland, depending on the local dialect, 80 can be ''{{lang|fr|quatre-vingts}}'' (Geneva, Neuchâtel, Jura) or ''{{lang|fr|huitante}}'' (Vaud, Valais, Fribourg).@@@@1@19@@danf@17-8-2009 10292470@unknown@formal@none@1@S@''Octante'' had been used in Switzerland in the past, but is now considered archaic.@@@@1@14@@danf@17-8-2009 10292480@unknown@formal@none@1@S@In Belgium, however, ''quatre-vingts'' is universally used.@@@@1@7@@danf@17-8-2009 10292490@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10292500@unknown@formal@none@1@S@French is written using the 26 letters of the [[Latin alphabet]], plus five diacritics (the [[circumflex]] accent, [[acute accent]], [[grave accent]], [[Umlaut (diacritic)|diaeresis]], and [[cedilla]]) and the two [[Ligature (typography)|ligatures]] (œ) and (æ).@@@@1@33@@danf@17-8-2009 10292510@unknown@formal@none@1@S@French spelling, like English spelling, tends to preserve obsolete pronunciation rules.@@@@1@11@@danf@17-8-2009 10292520@unknown@formal@none@1@S@This is mainly due to extreme phonetic changes since the Old French period, without a corresponding change in spelling.@@@@1@19@@danf@17-8-2009 10292530@unknown@formal@none@1@S@Moreover, some conscious changes were made to restore Latin orthography:@@@@1@10@@danf@17-8-2009 10292540@unknown@formal@none@1@S@* Old French ''doit'' > French ''doigt'' "finger" (Latin ''digitus'')@@@@1@10@@danf@17-8-2009 10292550@unknown@formal@none@1@S@* Old French ''pie'' > French ''pied'' "foot" (Latin ''pes'' (stem: ''ped-'')@@@@1@12@@danf@17-8-2009 10292560@unknown@formal@none@1@S@As a result, it is difficult to predict the spelling on the basis of the sound alone.@@@@1@17@@danf@17-8-2009 10292570@unknown@formal@none@1@S@Final consonants are generally silent, except when the following word begins with a vowel.@@@@1@14@@danf@17-8-2009 10292580@unknown@formal@none@1@S@For example, all of these words end in a vowel sound: ''pied'', ''aller'', ''les'', ''finit'', ''beaux''.@@@@1@16@@danf@17-8-2009 10292590@unknown@formal@none@1@S@The same words followed by a vowel, however, may sound the consonants, as they do in these examples: ''beaux-arts'', ''les amis'', ''pied-à-terre''.@@@@1@22@@danf@17-8-2009 10292600@unknown@formal@none@1@S@On the other hand, a given spelling will almost always lead to a predictable sound, and the [[Académie française]] works hard to enforce and update this correspondence.@@@@1@27@@danf@17-8-2009 10292610@unknown@formal@none@1@S@In particular, a given vowel combination or diacritic predictably leads to one phoneme.@@@@1@13@@danf@17-8-2009 10292620@unknown@formal@none@1@S@The diacritics have '''phonetic''', '''semantic''', and '''etymological''' significance.@@@@1@8@@danf@17-8-2009 10292630@unknown@formal@none@1@S@* [[acute accent]] (''é''): Over an ''e'', indicates the sound of a short ''ai'' in English, with no [[diphthong]].@@@@1@19@@danf@17-8-2009 10292640@unknown@formal@none@1@S@An ''é'' in modern French is often used where a combination of ''e'' and a consonant, usually ''s,'' would have been used formerly: ''écouter'' < ''escouter''.@@@@1@26@@danf@17-8-2009 10292650@unknown@formal@none@1@S@This type of accent mark is called ''accent aigu'' in French.@@@@1@11@@danf@17-8-2009 10292660@unknown@formal@none@1@S@* [[grave accent]] (''à'', ''è'', ''ù''): Over ''a'' or ''u'', used only to distinguish homophones: ''à'' ("to") vs. ''a'' ("has"), ''ou'' ("or") vs. ''où'' ("where").@@@@1@25@@danf@17-8-2009 10292670@unknown@formal@none@1@S@Over an ''e'', indicates the sound {{IPA|/ɛ/}}.@@@@1@7@@danf@17-8-2009 10292680@unknown@formal@none@1@S@* [[circumflex]] (''â'', ''ê'', ''î'', ''ô'', ''û''): Over an ''a'', ''e'' or ''o'', indicates the sound {{IPA|/ɑ/}}, {{IPA|/ɛ/}} or {{IPA|/o/}}, respectively (the distinction ''a'' {{IPA|/a/}} vs. ''â'' {{IPA|/ɑ/}} tends to disappear in many dialects).@@@@1@34@@danf@17-8-2009 10292690@unknown@formal@none@1@S@Most often indicates the historical deletion of an adjacent letter (usually an ''s'' or a vowel): ''château'' < ''castel'', ''fête'' < ''feste'', ''sûr'' < ''seur'', ''dîner'' < ''disner''.@@@@1@28@@danf@17-8-2009 10292700@unknown@formal@none@1@S@It has also come to be used to distinguish homophones: ''du'' ("of the") vs. ''dû'' (past participle of ''devoir'' "to have to do something (pertaining to an act)"; note that ''dû'' is in fact written thus because of a dropped ''e'': ''deu'').@@@@1@42@@danf@17-8-2009 10292710@unknown@formal@none@1@S@(''See [[Use of the circumflex in French]]'')@@@@1@7@@danf@17-8-2009 10292720@unknown@formal@none@1@S@* [[Umlaut (diacritic)|diaeresis]] or ''tréma'' (''ë'', ''ï'', ''ü'', ''ÿ''): Indicates that a vowel is to be pronounced separately from the preceding one: ''naïve'', ''Noël''.@@@@1@24@@danf@17-8-2009 10292730@unknown@formal@none@1@S@A diaeresis on ''y'' only occurs in some proper names and in modern editions of old French texts.@@@@1@18@@danf@17-8-2009 10292740@unknown@formal@none@1@S@Some proper names in which ''ÿ'' appears include ''Aÿ'' (commune in ''canton de la Marne'' formerly ''Aÿ-Champagne''), ''Rue des Cloÿs'' (alley in the 18th arrondisement of Paris), ''Croÿ'' (family name and hotel on the Boulevard Raspail, Paris), ''[[Château du Feÿ]]'' (near Joigny), ''Ghÿs'' (name of Flemish origin spelt ''Ghijs'' where ''ij'' in handwriting looked like ''ÿ'' to French clerks), ''l'Haÿ-les-Roses'' (commune between Paris and Orly airport), Pierre Louÿs (author), Moÿ (place in ''commune de l'Aisne'' and family name), and ''Le Blanc de Nicolaÿ'' (an insurance company in eastern France).@@@@1@89@@danf@17-8-2009 10292750@unknown@formal@none@1@S@The diaresis on ''u'' appears only in the biblical proper names ''Archélaüs'', ''Capharnaüm'', ''Emmaüs'', ''Ésaü'' and ''Saül''.@@@@1@17@@danf@17-8-2009 10292760@unknown@formal@none@1@S@Nevertheless, since the 1990 orthographic rectifications (which are not applied at all by most French people), the diaeresis in words containing ''guë'' (such as ''aiguë'' or ''ciguë'') may be moved onto the ''u'': ''aigüe'', ''cigüe''.@@@@1@35@@danf@17-8-2009 10292770@unknown@formal@none@1@S@Words coming from German retain the old Umlaut (''ä'', ''ö'' and ''ü'') if applicable but use French pronunciation, such as ''kärcher'' (trade mark of a pressure washer).@@@@1@27@@danf@17-8-2009 10292780@unknown@formal@none@1@S@* [[cedilla]] (''ç''): Indicates that an etymological ''c'' is pronounced {{IPA|/s/}} when it would otherwise be pronounced /k/.@@@@1@18@@danf@17-8-2009 10292790@unknown@formal@none@1@S@Thus ''je lance'' "I throw" (with ''c'' = {{IPA|[s]}} before ''e''), ''je lan'''ç'''ais'' "I was throwing" (''c'' would be pronounced {{IPA|[k]}} before ''a'' without the cedilla).@@@@1@26@@danf@17-8-2009 10292800@unknown@formal@none@1@S@The c cedilla (ç) softens the hard /k/ sound to /s/ before the vowels '''a''', '''o''' or '''u''', for example '''ça''' /sa/.@@@@1@22@@danf@17-8-2009 10292810@unknown@formal@none@1@S@C cedilla is never used before the vowels '''e''' or '''i''' since these two vowels always produce a soft /s/ sound ('''ce''', '''ci''').@@@@1@23@@danf@17-8-2009 10292820@unknown@formal@none@1@S@There are two [[ligatures]], which have various origins.@@@@1@8@@danf@17-8-2009 10292830@unknown@formal@none@1@S@* The ligature ''[[œ]]'' is a mandatory contraction of ''oe'' in certain words.@@@@1@13@@danf@17-8-2009 10292840@unknown@formal@none@1@S@Some of these are native French words, with the pronunciation {{IPA|/œ/}} or {{IPA|/ø/}}, e.g. ''sœur'' "sister" {{IPA|/sœʁ/}}, ''œuvre'' "work (of art)" {{IPA|/œvʁ/}}.@@@@1@22@@danf@17-8-2009 10292850@unknown@formal@none@1@S@Note that it usually appears in the combination ''œu''; ''œil'' is an exception.@@@@1@13@@danf@17-8-2009 10292860@unknown@formal@none@1@S@Many of these words were originally written with the [[Digraph (orthography)|digraph]] ''eu''; the ''o'' in the ligature represents a sometimes artificial attempt to imitate the Latin spelling: Latin ''bovem'' > Old French ''buef''/''beuf'' > Modern French ''bœuf''. ''Œ'' is also used in words of Greek origin, as the Latin rendering of the Greek diphthong ''οι'', e.g. ''cœlacanthe'' "coelacanth".@@@@1@58@@danf@17-8-2009 10292870@unknown@formal@none@1@S@These words used to be pronounced with the vowel {{IPA|/e/}}, but in recent years a spelling pronunciation with {{IPA|/ø/}} has taken hold, e.g. ''œsophage'' {{IPA|/ezɔfaʒ/}} or {{IPA|/øzɔfaʒ/}}.@@@@1@27@@danf@17-8-2009 10292880@unknown@formal@none@1@S@The pronunciation with {{IPA|/e/}} is often seen to be more correct.@@@@1@11@@danf@17-8-2009 10292890@unknown@formal@none@1@S@The ligature œ is not used in some occurrences of the letter combination ''oe'', for example, when ''o'' is part of a prefix (''coexister'').@@@@1@24@@danf@17-8-2009 10292900@unknown@formal@none@1@S@* The ligature ''[[æ]]'' is rare and appears in some words of Latin and Greek origin like ''ægosome'', ''ægyrine'', ''æschne'', ''cæcum'', ''nævus'' or ''uræus''.@@@@1@24@@danf@17-8-2009 10292910@unknown@formal@none@1@S@The vowel quality is identical to é {{IPA|/e/}}.@@@@1@8@@danf@17-8-2009 10292920@unknown@formal@none@1@S@French writing, as with any language, is affected by the spoken language.@@@@1@12@@danf@17-8-2009 10292930@unknown@formal@none@1@S@In Old French, the plural for ''animal'' was ''animals''.@@@@1@9@@danf@17-8-2009 10292940@unknown@formal@none@1@S@Common speakers pronounced a ''u'' before a word ending in ''l'' as the plural.@@@@1@14@@danf@17-8-2009 10292950@unknown@formal@none@1@S@This resulted in ''animauls''.@@@@1@4@@danf@17-8-2009 10292960@unknown@formal@none@1@S@As the French language evolved this vanished and the form ''animaux'' (''aux'' pronounced {{IPA|/o/}}) was admitted.@@@@1@16@@danf@17-8-2009 10292970@unknown@formal@none@1@S@The same is true for ''cheval'' pluralized as ''chevaux'' and many others.@@@@1@12@@danf@17-8-2009 10292980@unknown@formal@none@1@S@Also ''castel'' pl. ''castels'' became ''château'' pl. ''châteaux''.@@@@1@8@@danf@17-8-2009 10292990@unknown@formal@none@1@S@==Samples==@@@@1@1@@danf@17-8-2009 10300010@unknown@formal@none@1@S@
German language
@@@@1@2@@danf@17-8-2009 10300020@unknown@formal@none@1@S@The '''German language''' ({{lang|de|''Deutsch''}}) is a [[West Germanic languages|West Germanic language]] and one of the world's [[world language|major languages]].@@@@1@19@@danf@17-8-2009 10300030@unknown@formal@none@1@S@German is closely related to and classified alongside [[English language|English]] and [[Dutch language|Dutch]].@@@@1@13@@danf@17-8-2009 10300040@unknown@formal@none@1@S@Around the world, German is spoken by approximately 100 million [[First language|native speakers]] and also about 80 million non-native speakers, and [[Standard German]] is widely taught in schools, universities, and [[Goethe Institute]]s worldwide.@@@@1@33@@danf@17-8-2009 10300050@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10300060@unknown@formal@none@1@S@===Europe===@@@@1@1@@danf@17-8-2009 10300070@unknown@formal@none@1@S@German is spoken primarily in [[Languages of Germany|Germany]] (95%), [[Languages of Austria|Austria]] (89%) and [[Linguistic geography of Switzerland|Switzerland]] (64%) together with [[Liechtenstein]], [[Luxembourg]] ([[D-A-CH-Li-Lux]]) constituting the countries where German is the majority language.@@@@1@33@@danf@17-8-2009 10300080@unknown@formal@none@1@S@Other European German-speaking communities are found in [[Italy]] ([[Province of Bolzano-Bozen|Bolzano-Bozen]]), in the [[German speaking community in Belgium|East Cantons]] of [[Belgium]], in the [[France|french]] area [[Alsace]] which often was traded between Germany and France in history and in some border villages of the former [[South Jutland County]] (in German, ''Nordschleswig'', in Danish, ''Sønderjylland'') of [[Denmark]].@@@@1@55@@danf@17-8-2009 10300090@unknown@formal@none@1@S@Some German-speaking communities still survive in parts of [[Romania]], the [[Czech Republic]], [[Poland]], [[Hungary]], and above all [[Russia]] and [[Kazakhstan]], although forced expulsions after World War II and massive emigration to Germany in the 1980s and 1990s have depopulated most of these communities.@@@@1@43@@danf@17-8-2009 10300100@unknown@formal@none@1@S@It is also spoken by German-speaking foreign populations and some of their descendants in [[Portugal]], [[Spain]], Italy, [[Morocco]], [[Egypt]], [[Israel]], [[Cyprus]], [[Turkey]], [[Greece]], [[United Kingdom]], [[Netherlands]], [[Scandinavia]], [[Siberia]] in Russia, Hungary, Romania, [[Bulgaria]], and the former [[Yugoslavia]] ([[Bosnia and Herzegovina|Bosnia]], [[Serbia]], [[Republic of Macedonia|Macedonia]], [[Croatia]] and [[Slovenia]]).@@@@1@47@@danf@17-8-2009 10300110@unknown@formal@none@1@S@In Luxembourg and the surrounding areas, big parts of the native population speak German dialects, and some people also master standard German (especially in Luxembourg), although in the [[France|French]] regions of [[Alsace]] (German: ''Elsass'') and [[Lorraine (region)|Lorraine]] (German: ''Lothringen'') [[French language|French]] has replaced the local German dialects as the official language, even though it has not been fully replaced on the street.@@@@1@62@@danf@17-8-2009 10300120@unknown@formal@none@1@S@===Overseas===@@@@1@1@@danf@17-8-2009 10300130@unknown@formal@none@1@S@Outside of Europe and the former [[Soviet Union]], the largest German-speaking communities are to be found in the [[United States]], [[Canada]], [[Brazil]] and in [[Argentina]] where millions of Germans migrated in the last 200 years; but the vast majority of their descendants no longer speak German.@@@@1@46@@danf@17-8-2009 10300140@unknown@formal@none@1@S@Additionally, German-speaking communities can be found in the former [[List of former German colonies|German colony]] of [[Namibia]] independent from [[South Africa]] since 1990, as well as in the other countries of German emigration such as [[Canada]], [[Mexico]], [[Dominican Republic]], [[Paraguay]], [[Uruguay]], [[Chile]], [[Peru]], [[Venezuela]] (where [[Alemán Coloniero]] developed), South Africa and [[Australia]].@@@@1@52@@danf@17-8-2009 10300150@unknown@formal@none@1@S@====South America====@@@@1@2@@danf@17-8-2009 10300160@unknown@formal@none@1@S@In Brazil the largest concentrations of German speakers are in [[Rio Grande do Sul]] (where [[Riograndenser Hunsrückisch]] was developed), [[Santa Catarina (state)|Santa Catarina]], [[Paraná (state)|Paraná]], and [[Espírito Santo]], and large German-speaking descendant communities in Argentina, Uruguay and Chile.@@@@1@38@@danf@17-8-2009 10300170@unknown@formal@none@1@S@In the 20th century, over 100,000 German [[Refugee|political refugees]] and invited entrepreneurs settled in [[Latin America]], such as [[Costa Rica]], [[Panama]], Venezuela and the Dominican Republic to establish German-speaking enclaves, and there is a reportedly small [[German immigration to Puerto Rico]].@@@@1@41@@danf@17-8-2009 10300180@unknown@formal@none@1@S@====North America====@@@@1@2@@danf@17-8-2009 10300190@unknown@formal@none@1@S@The United States has the largest concentration of German speakers outside of Europe; an indication of this presence can be found in the names of such villages and towns as [[New Leipzig, North Dakota|New Leipzig]], [[Munich, North Dakota|Munich]], [[Karlsruhe, North Dakota|Karlsruhe]], and [[Strasburg, North Dakota|Strasburg]], [[North Dakota]], and [[New Braunfels]], Texas.@@@@1@51@@danf@17-8-2009 10300200@unknown@formal@none@1@S@Though over the course of the 20th century many of the descendants of 18th and 19th-century immigrants ceased speaking German at home, small populations of elderly (as well as some younger) speakers can be found in [[Pennsylvania]] ([[Amish]], [[Hutterites]], [[Dunkards]] and some [[Mennonites]] historically spoke [[Pennsylvania German language|Pennsylvania Dutch]] (a [[West Central German]] variety) and [[Hutterite German]]), [[Kansas]] (Mennonites and [[Volga German]]s), North Dakota (Hutterite Germans, Mennonites, [[History of Germans in Russia and the Soviet Union|Russian German]]s, Volga Germans, and [[Baltic Germans]]), [[South Dakota]], [[Montana]], [[Texas]] ([[Texas German]]), [[Wisconsin]], [[Indiana]], [[Louisiana]] and [[Oklahoma]].@@@@1@93@@danf@17-8-2009 10300210@unknown@formal@none@1@S@Early twentieth century immigration was often to [[St. Louis, Missouri|St. Louis]], [[Chicago]], [[New York]], [[Pittsburgh]] and [[Cincinnati]].@@@@1@17@@danf@17-8-2009 10300220@unknown@formal@none@1@S@Most of the post–[[World War II]] wave are in the New York, [[Philadelphia]], [[Los Angeles]], [[San Francisco]] and Chicago [[urban area]]s, and in [[Florida]], [[Arizona]] and [[California]] where large communities of retired German, Swiss and Austrian expatriates live.@@@@1@38@@danf@17-8-2009 10300230@unknown@formal@none@1@S@The [[German Americans|American population of German ancestry]] is above 60 million.@@@@1@11@@danf@17-8-2009 10300240@unknown@formal@none@1@S@The German language is the third largest language in the U.S. after [[Spanish language|Spanish]].@@@@1@14@@danf@17-8-2009 10300250@unknown@formal@none@1@S@In Canada there are people of German ancestry throughout the country and especially in the western cities such as [[Kelowna]].@@@@1@20@@danf@17-8-2009 10300260@unknown@formal@none@1@S@German is also spoken in [[Ontario]] and southern [[Nova Scotia]].@@@@1@10@@danf@17-8-2009 10300270@unknown@formal@none@1@S@There is a large and vibrant community in the city of [[Kitchener, Ontario]].@@@@1@13@@danf@17-8-2009 10300280@unknown@formal@none@1@S@German immigrants were instrumental in the country's three largest urban areas: [[Montreal]], [[Toronto]] and [[Vancouver]], but post-WWII immigrants managed to preserve a fluency in the German language in their respective neighborhoods and sections.@@@@1@33@@danf@17-8-2009 10300290@unknown@formal@none@1@S@In the first half of the 20th century, over a million [[German-Canadian]]s made the language one of Canada's most spoken after [[French language|French]].@@@@1@23@@danf@17-8-2009 10300300@unknown@formal@none@1@S@In Mexico there are also large populations of German ancestry, mainly in the cities of: [[Mexico City]], [[Puebla]], [[Mazatlán]], [[Tapachula]], and larger populations scattered in the states of [[Chihuahua]], [[Durango]], and [[Zacatecas]].@@@@1@32@@danf@17-8-2009 10300310@unknown@formal@none@1@S@German ancestry is also said to be found in neighboring towns around [[Guadalajara, Jalisco]] and much of Northern Mexico, where German influence was immersed into the Mexican culture.@@@@1@28@@danf@17-8-2009 10300320@unknown@formal@none@1@S@Standard German is spoken by the affluent German communities in Puebla, Mexico City, [[Nuevo Leon]], [[San Luis Potosi]] and [[Quintana Roo]].@@@@1@21@@danf@17-8-2009 10300330@unknown@formal@none@1@S@German immigration in the twentieth century was small, but produced German-speaking communities in Central America (i.e.@@@@1@16@@danf@17-8-2009 10300340@unknown@formal@none@1@S@[[Guatemala]], [[Honduras]] and [[Nicaragua]]) and the Caribbean Islands like the [[Dominican Republic]].@@@@1@12@@danf@17-8-2009 10300350@unknown@formal@none@1@S@'''Dialects in North America:'''@@@@1@4@@danf@17-8-2009 10300360@unknown@formal@none@1@S@The dialects of German which are or were primarily spoken in colonies or communities founded by German speaking people resemble the dialects of the regions the founders came from.@@@@1@29@@danf@17-8-2009 10300370@unknown@formal@none@1@S@For example, Pennsylvania German resembles dialects of the [[Rhenish Palatinate|Palatinate]], and Hutterite German resembles dialects of [[Carinthia (state)|Carinthia]].@@@@1@18@@danf@17-8-2009 10300380@unknown@formal@none@1@S@[[Texas German]] is a dialect spoken in the areas of Texas settled by the [[Adelsverein]], such as New Braunfels and Fredericksburg.@@@@1@21@@danf@17-8-2009 10300390@unknown@formal@none@1@S@In the [[Amana Colonies]] in the state of Iowa [[Amana German]] is spoken.@@@@1@13@@danf@17-8-2009 10300400@unknown@formal@none@1@S@[[Plautdietsch]] is a large [[minority language]] spoken in Northern Mexico by the [[Mennonite]] communities, and is spoken by more than 200,000 people in Mexico.@@@@1@24@@danf@17-8-2009 10300410@unknown@formal@none@1@S@[[Hutterite German]] is an Upper German dialect of the [[Austro-Bavarian]] variety of the German language, which is spoken by Hutterite communities in Canada and the United States.@@@@1@27@@danf@17-8-2009 10300420@unknown@formal@none@1@S@Hutterite is spoken in the U.S. states of [[Washington]], Montana, North Dakota and South Dakota, and [[Minnesota]]; and in the Canadian provinces of [[Alberta]], [[Saskatchewan]] and [[Manitoba]].@@@@1@27@@danf@17-8-2009 10300430@unknown@formal@none@1@S@Its speakers belong to some Schmiedleit, Lehrerleit, and Dariusleit Hutterite groups, but there are also speakers among the older generations of Prairieleit (the descendants of those Hutterites who chose not to settle in colonies).@@@@1@34@@danf@17-8-2009 10300440@unknown@formal@none@1@S@Hutterite children who grow up in the colonies learn and speak first Hutterite German before learning English in the public school, the standard language of the surrounding areas.@@@@1@28@@danf@17-8-2009 10300450@unknown@formal@none@1@S@Many colonies though continue with German Grammar School, separate from the public school, throughout a student's elementary education.@@@@1@18@@danf@17-8-2009 10300460@unknown@formal@none@1@S@====Creoles====@@@@1@1@@danf@17-8-2009 10300470@unknown@formal@none@1@S@There is an important German creole being studied and recovered, named [[Unserdeutsch]], spoken in the former German colony of [[Papua New Guinea]], across [[Micronesia]] and in northern Australia (i.e. coastal parts of [[Queensland]] and [[Western Australia]]), by few elderly people.@@@@1@40@@danf@17-8-2009 10300480@unknown@formal@none@1@S@The risk of its extinction is serious and efforts to revive interest in the language are being implemented by scholars.@@@@1@20@@danf@17-8-2009 10300490@unknown@formal@none@1@S@====Internet====@@@@1@1@@danf@17-8-2009 10300500@unknown@formal@none@1@S@According to [[Global Reach]] (2004), 6.9% of the Internet population is German.@@@@1@12@@danf@17-8-2009 10300510@unknown@formal@none@1@S@According to [[Netz-tipp]] (2002), 7.7% of webpages are written in German, making it second only to English in the European language group.@@@@1@22@@danf@17-8-2009 10300520@unknown@formal@none@1@S@They also report that 12% of Google's users use its German interface.@@@@1@12@@danf@17-8-2009 10300530@unknown@formal@none@1@S@Older statistics: Babel (1998) found somewhat similar demographics.@@@@1@8@@danf@17-8-2009 10300540@unknown@formal@none@1@S@FUNREDES (1998) and Vilaweb (2000) both found that German is the third most popular language used by websites, after English and Japanese.@@@@1@22@@danf@17-8-2009 10300550@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10300560@unknown@formal@none@1@S@The history of the language begins with the [[High German consonant shift]] during the [[migration period]], separating [[High German]] dialects from common [[West Germanic]].@@@@1@24@@danf@17-8-2009 10300570@unknown@formal@none@1@S@The earliest testimonies of [[Old High German]] are from scattered [[Elder Futhark]] inscriptions, especially in [[Alemannic]], from the 6th century, the earliest glosses (''[[Abrogans]]'') date to the 8th and the oldest coherent texts (the ''[[Hildebrandslied]]'', the ''[[Muspilli]]'' and the [[Merseburg Incantations]]) to the 9th century.@@@@1@45@@danf@17-8-2009 10300580@unknown@formal@none@1@S@[[Old Saxon]] at this time belongs to the [[Ingvaeonic|North Sea Germanic]] cultural sphere, and [[Low Saxon]] should fall under German rather than [[Anglo-Frisian]] influence during the [[Holy Roman Empire]].@@@@1@29@@danf@17-8-2009 10300590@unknown@formal@none@1@S@As Germany was divided into many different [[state]]s, the only force working for a unification or [[standard language|standardization]] of German during a period of several hundred years was the general preference of writers trying to write in a way that could be understood in the largest possible area.@@@@1@48@@danf@17-8-2009 10300600@unknown@formal@none@1@S@When [[Martin Luther]] translated the [[Bible]] (the [[New Testament]] in 1522 and the [[Old Testament]], published in parts and completed in 1534) he based his translation mainly on the bureaucratic standard language used in Saxony (''sächsische Kanzleisprache'') also known as ''Meißner-Deutsch'' (Meißner-German), which was the most widely understood language at this time, because the region it was spoken in was quite influential amongst the German states.@@@@1@66@@danf@17-8-2009 10300610@unknown@formal@none@1@S@This language was based on Eastern Upper and Eastern Central German dialects and preserved much of the grammatical system of Middle High German (unlike the spoken German dialects in Central and Upper Germany that already at that time began to lose the [[genitive case]] and the preterite tense).@@@@1@48@@danf@17-8-2009 10300620@unknown@formal@none@1@S@In the beginning, copies of the Bible had a long list for each region, which translated words unknown in the region into the regional dialect.@@@@1@25@@danf@17-8-2009 10300630@unknown@formal@none@1@S@[[Roman Catholics]] rejected Luther's translation in the beginning and tried to create their own Catholic standard (''gemeines Deutsch'') — which, however, only differed from 'Protestant German' in some minor details.@@@@1@30@@danf@17-8-2009 10300640@unknown@formal@none@1@S@It took until the middle of the 18th century to create a standard that was widely accepted, thus ending the period of [[Early New High German]].@@@@1@26@@danf@17-8-2009 10300650@unknown@formal@none@1@S@In 1901 the 2nd Orthographical Conference ended with a complete standardization of German language in written form while the ''Deutsche Bühnensprache'' (literally: ''German stage-language'') had already established spelling-rules for German three years earlier which were later to become obligatory for general German pronunciation.@@@@1@43@@danf@17-8-2009 10300660@unknown@formal@none@1@S@German used to be the language of commerce and government in the [[Habsburg Empire]], which encompassed a large area of Central and Eastern Europe.@@@@1@24@@danf@17-8-2009 10300670@unknown@formal@none@1@S@Until the mid-19th century it was essentially the language of townspeople throughout most of the Empire.@@@@1@16@@danf@17-8-2009 10300680@unknown@formal@none@1@S@It indicated that the speaker was a [[merchant]], an urbanite, not their nationality.@@@@1@13@@danf@17-8-2009 10300690@unknown@formal@none@1@S@Some cities, such as [[Prague]] (German: ''Prag'') and [[Budapest]] ([[Buda]], German: ''Ofen''), were gradually [[Germanization|Germanized]] in the years after their incorporation into the Habsburg domain.@@@@1@25@@danf@17-8-2009 10300700@unknown@formal@none@1@S@Others, such as [[Bratislava]](German: ''Pressburg''), were originally settled during the Habsburg period and were primarily German at that time.@@@@1@19@@danf@17-8-2009 10300710@unknown@formal@none@1@S@A few cities such as [[Milan]] (German: ''Mailand'') remained primarily non-German.@@@@1@11@@danf@17-8-2009 10300720@unknown@formal@none@1@S@However, most cities were primarily German during this time, such as Prague, Budapest, Bratislava (German: ''Pressburg''), [[Zagreb]] (German: ''Agram''), and [[Ljubljana]] (German: ''Laibach''), though they were surrounded by territory that spoke other languages.@@@@1@33@@danf@17-8-2009 10300730@unknown@formal@none@1@S@Until about 1800, standard German was almost only a written language.@@@@1@11@@danf@17-8-2009 10300740@unknown@formal@none@1@S@At this time, people in urban [[northern Germany]], who spoke dialects very different from Standard German, learned it almost like a foreign language and tried to pronounce it as close to the spelling as possible.@@@@1@35@@danf@17-8-2009 10300750@unknown@formal@none@1@S@Prescriptive pronunciation guides used to consider northern [[German phonology|German pronunciation]] to be the standard.@@@@1@14@@danf@17-8-2009 10300760@unknown@formal@none@1@S@However, the actual pronunciation of standard German varies from region to region.@@@@1@12@@danf@17-8-2009 10300770@unknown@formal@none@1@S@Media and written works are almost all produced in standard German (often called ''Hochdeutsch'' in German) which is understood in all areas where German is spoken, except by [[Nursery school|pre-school]] children in areas which speak only dialect, for example [[Switzerland]] and [[Austria]].@@@@1@42@@danf@17-8-2009 10300780@unknown@formal@none@1@S@However, in this age of television, even they now usually learn to understand Standard German before school age.@@@@1@18@@danf@17-8-2009 10300790@unknown@formal@none@1@S@The first dictionary of the [[Brothers Grimm]], the 16 parts of which were issued between 1852 and 1860, remains the most comprehensive guide to the words of the German language.@@@@1@30@@danf@17-8-2009 10300800@unknown@formal@none@1@S@In 1860, grammatical and orthographic rules first appeared in the ''[[Duden Handbook]]''.@@@@1@12@@danf@17-8-2009 10300810@unknown@formal@none@1@S@In 1901, this was declared the standard definition of the German language.@@@@1@12@@danf@17-8-2009 10300820@unknown@formal@none@1@S@Official revisions of some of these rules were not issued until 1998, when the [[German spelling reform of 1996]] was officially promulgated by governmental representatives of all German-speaking countries.@@@@1@29@@danf@17-8-2009 10300830@unknown@formal@none@1@S@Since the reform, German spelling has been in an eight-year transitional period where the reformed spelling is taught in most schools, while traditional and reformed spellings co-exist in the media.@@@@1@30@@danf@17-8-2009 10300840@unknown@formal@none@1@S@See [[German spelling reform of 1996]] for an overview of the public debate concerning the reform with some major newspapers and magazines and several known writers refusing to adopt it.@@@@1@30@@danf@17-8-2009 10300850@unknown@formal@none@1@S@The German spelling reform of 1996 led to public controversy indeed to considerable dispute.@@@@1@14@@danf@17-8-2009 10300860@unknown@formal@none@1@S@Some state parliaments (Bundesländer) would not accept it ([[North Rhine-Westphalia|North Rhine Westphalia]] and Bavaria).@@@@1@14@@danf@17-8-2009 10300870@unknown@formal@none@1@S@The dispute landed at one point in the highest court which made a short issue of it, claiming that the states had to decide for themselves and that only in schools could the reform be made the official rule - everybody else could continue writing as they had learned it.@@@@1@50@@danf@17-8-2009 10300880@unknown@formal@none@1@S@After 10 years, without any intervention by the federal parliament, a major yet incomplete revision was installed in 2006, just in time for the new school year of 2006.@@@@1@29@@danf@17-8-2009 10300890@unknown@formal@none@1@S@In 2007, some venerable spellings will be finally invalidated even though they caused little or no trouble.@@@@1@17@@danf@17-8-2009 10300900@unknown@formal@none@1@S@The only sure and easily recognizable symptom of a text's being in compliance with the reform is the -ss at the end of words, like in ''dass'' and ''muss''.@@@@1@29@@danf@17-8-2009 10300910@unknown@formal@none@1@S@Classic spelling forbade this ending, instead using ''daß'' and ''muß''.@@@@1@10@@danf@17-8-2009 10300920@unknown@formal@none@1@S@The cause of the controversy evolved around the question whether a language is part of the culture which must be preserved or a means of communicating information which has to allow for growth.@@@@1@33@@danf@17-8-2009 10300930@unknown@formal@none@1@S@(The reformers seemed to be unimpressed by the fact that a considerable part of that culture - namely the entire German literature of the 20th century - is in the old spelling.)@@@@1@32@@danf@17-8-2009 10300940@unknown@formal@none@1@S@The increasing use of English in Germany's higher education system, as well as in business and in popular culture, has led various German academics to state, not necessarily from an entirely negative perspective, that German is a language in decline in its native country.@@@@1@44@@danf@17-8-2009 10300950@unknown@formal@none@1@S@For example, Ursula Kimpel, of the [[University of Tübingen]], said in 2005 that “German universities are offering more courses in English because of the large number of students coming from abroad.@@@@1@31@@danf@17-8-2009 10300960@unknown@formal@none@1@S@German is unfortunately a language in decline.@@@@1@7@@danf@17-8-2009 10300970@unknown@formal@none@1@S@We need and want our professors to be able to teach effectively in English.”@@@@1@14@@danf@17-8-2009 10300980@unknown@formal@none@1@S@==Standard German==@@@@1@2@@danf@17-8-2009 10300990@unknown@formal@none@1@S@Standard German originated not as a traditional dialect of a specific region, but as a [[written language]].@@@@1@17@@danf@17-8-2009 10301000@unknown@formal@none@1@S@However, there are places where the traditional regional dialects have been replaced by standard German; this is the case in vast stretches of Northern Germany, but also in major cities in other parts of the country.@@@@1@36@@danf@17-8-2009 10301010@unknown@formal@none@1@S@Standard German differs regionally, between German-speaking countries, in [[vocabulary]] and some instances of [[pronunciation]], and even [[grammar]] and [[orthography]].@@@@1@19@@danf@17-8-2009 10301020@unknown@formal@none@1@S@This variation must not be confused with the variation of local dialects.@@@@1@12@@danf@17-8-2009 10301030@unknown@formal@none@1@S@Even though the regional varieties of standard German are only to a certain degree influenced by the local dialects, they are very distinct.@@@@1@23@@danf@17-8-2009 10301040@unknown@formal@none@1@S@German is thus considered a pluricentric language.@@@@1@7@@danf@17-8-2009 10301050@unknown@formal@none@1@S@In most regions, the speakers use a continuum of mixtures from more dialectal varieties to more standard varieties according to situation.@@@@1@21@@danf@17-8-2009 10301060@unknown@formal@none@1@S@In the German-speaking parts of Switzerland, mixtures of dialect and standard are very seldom used, and the use of standard German is largely restricted to the written language.@@@@1@28@@danf@17-8-2009 10301070@unknown@formal@none@1@S@Therefore, this situation has been called a ''medial [[diglossia]]''.@@@@1@9@@danf@17-8-2009 10301080@unknown@formal@none@1@S@[[Swiss Standard German]] is used in the Swiss education system.@@@@1@10@@danf@17-8-2009 10301090@unknown@formal@none@1@S@===Official status===@@@@1@2@@danf@17-8-2009 10301100@unknown@formal@none@1@S@Standard German is the only [[official language]] in Liechtenstein and Austria; it shares official status in [[Germany]] (with [[Danish language|Danish]], [[Frisian languages|Frisian]] and [[Sorbian languages|Sorbian]] as minority languages), Switzerland (with [[French language|French]], [[Italian language|Italian]] and [[Romansh language|Romansh]]), Belgium (with [[Dutch language|Dutch]] and French) and Luxembourg (with French and [[Luxembourgish language|Luxembourgish]]).@@@@1@50@@danf@17-8-2009 10301110@unknown@formal@none@1@S@It is used as a local official language in Italy ([[Province of Bolzano-Bozen]]), as well as in the cities of [[Sopron]] (Hungary), Krahule ([[Slovakia]]) and several cities in Romania.@@@@1@29@@danf@17-8-2009 10301120@unknown@formal@none@1@S@It is the official language (with Italian) of the [[Vatican City|Vatican]] [[Swiss Guard]].@@@@1@13@@danf@17-8-2009 10301130@unknown@formal@none@1@S@German has an officially recognized status as regional or auxiliary language in Denmark ([[South Jutland]] region), France (Alsace and [[Moselle]] regions), Italy (Gressoney valley), Namibia, [[Poland]] ([[Bilingual communes in Poland|Opole]] region), and Russia (Asowo and Halbstadt).@@@@1@36@@danf@17-8-2009 10301140@unknown@formal@none@1@S@German is one of the 23 official [[languages of the European Union]].@@@@1@12@@danf@17-8-2009 10301150@unknown@formal@none@1@S@It is the language with the largest number of native speakers in the [[European Union]], and, shortly after English and long before French, the second-most spoken language in Europe.@@@@1@29@@danf@17-8-2009 10301160@unknown@formal@none@1@S@===German as a foreign language===@@@@1@5@@danf@17-8-2009 10301170@unknown@formal@none@1@S@German is the third most taught [[foreign language]] in the English speaking world after French and Spanish.@@@@1@17@@danf@17-8-2009 10301180@unknown@formal@none@1@S@German is the main language of about 90–95 million people in Europe (as of 2004), or 13.3% of all Europeans, being the second most spoken native language in Europe after [[Russian language|Russian]], above French (66.5 million speakers in 2004) and English (64.2 million speakers in 2004).@@@@1@46@@danf@17-8-2009 10301190@unknown@formal@none@1@S@It is therefore the most spoken first language in the EU.@@@@1@11@@danf@17-8-2009 10301200@unknown@formal@none@1@S@It is the second most known foreign language in the EU.@@@@1@11@@danf@17-8-2009 10301210@unknown@formal@none@1@S@It is one of the official languages of the European Union, and one of the three [[working language]]s of [[European Commission|the European Commission]], along with English and French.@@@@1@28@@danf@17-8-2009 10301220@unknown@formal@none@1@S@Thirty-two percent of citizens of the EU-15 countries say they can converse in German (either as a mother tongue or as a second or foreign language).@@@@1@26@@danf@17-8-2009 10301230@unknown@formal@none@1@S@This is assisted by the widespread availability of German TV by cable or satellite.@@@@1@14@@danf@17-8-2009 10301240@unknown@formal@none@1@S@German was once, and still remains to some extent, a [[lingua franca]] in Central, Eastern and [[Northern Europe]].@@@@1@18@@danf@17-8-2009 10301250@unknown@formal@none@1@S@==Dialects==@@@@1@1@@danf@17-8-2009 10301260@unknown@formal@none@1@S@German is a member of the [[West Germanic language|western branch]] of the [[Germanic languages|Germanic]] [[Language family|family of languages]], which in turn is part of the [[Indo-European language family]].@@@@1@28@@danf@17-8-2009 10301270@unknown@formal@none@1@S@The German dialect continuum is traditionally divided most broadly into [[High German languages|High German]] and Low German.@@@@1@17@@danf@17-8-2009 10301280@unknown@formal@none@1@S@The variation among the German dialects is considerable, with only the neighbouring dialects being mutually intelligible.@@@@1@16@@danf@17-8-2009 10301290@unknown@formal@none@1@S@Some dialects are not intelligible to people who only know standard German.@@@@1@12@@danf@17-8-2009 10301300@unknown@formal@none@1@S@However, all German dialects belong to the dialect continuum of High German and Low Saxon languages.@@@@1@16@@danf@17-8-2009 10301310@unknown@formal@none@1@S@Until roughly the end of the Second World War, there was a dialect continuum of all the continental West Germanic languages because nearly any pair of neighbouring dialects were perfectly mutually intelligible.@@@@1@32@@danf@17-8-2009 10301320@unknown@formal@none@1@S@=== Low German ===@@@@1@4@@danf@17-8-2009 10301330@unknown@formal@none@1@S@Low Saxon varieties (spoken on German territory) are considered linguistically a language separate from the German language by some, but just a dialect by others.@@@@1@25@@danf@17-8-2009 10301340@unknown@formal@none@1@S@Sometimes, Low Saxon and [[Low Franconian]] are grouped together because both are unaffected by the High German consonant shift.@@@@1@19@@danf@17-8-2009 10301350@unknown@formal@none@1@S@However, the part of the population capable of speaking and responding to it, or of understanding it has decreased continuously since WWII.@@@@1@22@@danf@17-8-2009 10301360@unknown@formal@none@1@S@Currently the effort to maintain a residual presence in cultural life is negligible.@@@@1@13@@danf@17-8-2009 10301370@unknown@formal@none@1@S@[[Middle Low German]] was the [[lingua franca]] of the [[Hanseatic League]].@@@@1@11@@danf@17-8-2009 10301380@unknown@formal@none@1@S@It was the predominant language in Northern Germany.@@@@1@8@@danf@17-8-2009 10301390@unknown@formal@none@1@S@This changed in the 16th century.@@@@1@6@@danf@17-8-2009 10301400@unknown@formal@none@1@S@In 1534 the [[Luther Bible]] by Martin Luther was printed.@@@@1@10@@danf@17-8-2009 10301410@unknown@formal@none@1@S@This translation is considered to be an important step towards the evolution of the Early New High German.@@@@1@18@@danf@17-8-2009 10301420@unknown@formal@none@1@S@It aimed to be understandable to an ample audience and was based mainly on Central and [[Upper German]] varieties.@@@@1@19@@danf@17-8-2009 10301430@unknown@formal@none@1@S@The Early New High German language gained more prestige than Low Saxon and became the language of science and literature.@@@@1@20@@danf@17-8-2009 10301440@unknown@formal@none@1@S@Other factors were that around the same time, the Hanseatic league lost its importance as new trade routes to [[Asia]] and the [[Americas]] were established, and that the most powerful German states of that period were located in Middle and Southern Germany.@@@@1@42@@danf@17-8-2009 10301450@unknown@formal@none@1@S@The 18th and 19th centuries were marked by mass [[education]], the language of the schools being standard German.@@@@1@18@@danf@17-8-2009 10301460@unknown@formal@none@1@S@Slowly Low Saxon was pushed back and back until it was nothing but a language spoken by the uneducated and at home.@@@@1@22@@danf@17-8-2009 10301470@unknown@formal@none@1@S@Today Low Saxon can be divided in two groups: Low Saxon varieties with a reasonable standard German influx and varieties of Standard German with a Low Saxon influence known as [[Missingsch]].@@@@1@31@@danf@17-8-2009 10301480@unknown@formal@none@1@S@=== High German ===@@@@1@4@@danf@17-8-2009 10301490@unknown@formal@none@1@S@High German is divided into [[Central German]] and [[Upper German language|Upper German]].@@@@1@12@@danf@17-8-2009 10301500@unknown@formal@none@1@S@Central German dialects include [[Ripuarian]], [[Moselle Franconian]], [[Hessian language|Hessian]], [[Thuringian]], [[South Franconian]], [[Lorraine Franconian]] and [[Upper Saxon dialect|Upper Saxon]].@@@@1@19@@danf@17-8-2009 10301510@unknown@formal@none@1@S@It is spoken in the southeastern Netherlands, eastern Belgium, Luxembourg, parts of France, and in Germany approximately between the River [[Main]] and the southern edge of the Lowlands.@@@@1@28@@danf@17-8-2009 10301520@unknown@formal@none@1@S@Modern Standard German is mostly based on Central German, but it should be noted that the common (but not linguistically correct) German term for modern Standard German is ''Hochdeutsch'', that is, ''High German''.@@@@1@33@@danf@17-8-2009 10301530@unknown@formal@none@1@S@The Moselle Franconian varieties spoken in Luxembourg have been officially standardised and institutionalised and are therefore usually considered a separate language known as [[Luxembourgish language|Luxembourgish]].@@@@1@25@@danf@17-8-2009 10301540@unknown@formal@none@1@S@Upper German dialects include [[Alemannic German|Alemannic]] (for instance [[Swiss German (linguistics)|Swiss German]]), [[Swabian German|Swabian]], [[East Franconian German|East Franconian]], [[Alsatian]] and [[Austro-Bavarian]].@@@@1@21@@danf@17-8-2009 10301550@unknown@formal@none@1@S@They are spoken in parts of the Alsace, southern Germany, Liechtenstein, Austria, and in the German-speaking parts of Switzerland and Italy.@@@@1@21@@danf@17-8-2009 10301560@unknown@formal@none@1@S@[[Wymysorys]], [[Sathmarisch]] and [[Siebenbürgisch]] are High German dialects of Poland and Romania respectively.@@@@1@13@@danf@17-8-2009 10301570@unknown@formal@none@1@S@The High German varieties spoken by [[Ashkenazi Jew]]s (mostly in the former [[Soviet Union]]) have several unique features, and are usually considered as a separate language, [[Yiddish]].@@@@1@27@@danf@17-8-2009 10301580@unknown@formal@none@1@S@It is the only Germanic language that does not use the [[Latin alphabet]] as its [[official script|standard script]].@@@@1@18@@danf@17-8-2009 10301590@unknown@formal@none@1@S@===German dialects versus varieties of standard German===@@@@1@7@@danf@17-8-2009 10301600@unknown@formal@none@1@S@In German [[linguistics]], German [[dialect]]s are distinguished from [[variety (linguistics)|varieties]] of [[standard German]].@@@@1@13@@danf@17-8-2009 10301610@unknown@formal@none@1@S@*The ''German dialects'' are the traditional local varieties.@@@@1@8@@danf@17-8-2009 10301620@unknown@formal@none@1@S@They are traditionally traced back to the different German tribes.@@@@1@10@@danf@17-8-2009 10301630@unknown@formal@none@1@S@Many of them are hardly understandable to someone who knows only standard German, since they often differ from standard German in [[lexicon]], [[phonology]] and [[syntax]].@@@@1@25@@danf@17-8-2009 10301640@unknown@formal@none@1@S@If a narrow definition of [[language]] based on [[mutual intelligibility]] is used, many German dialects are considered to be separate languages (for instance in the [[Ethnologue]]).@@@@1@26@@danf@17-8-2009 10301650@unknown@formal@none@1@S@However, such a point of view is unusual in German linguistics.@@@@1@11@@danf@17-8-2009 10301660@unknown@formal@none@1@S@*The ''varieties of standard German'' refer to the different local varieties of the [[pluricentric language|pluricentric]] standard German.@@@@1@17@@danf@17-8-2009 10301670@unknown@formal@none@1@S@They only differ slightly in lexicon and phonology.@@@@1@8@@danf@17-8-2009 10301680@unknown@formal@none@1@S@In certain regions, they have replaced the traditional German dialects, especially in Northern Germany.@@@@1@14@@danf@17-8-2009 10301690@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10301700@unknown@formal@none@1@S@German is an [[Fusional language|inflected language]].@@@@1@6@@danf@17-8-2009 10301710@unknown@formal@none@1@S@===Noun inflection===@@@@1@2@@danf@17-8-2009 10301720@unknown@formal@none@1@S@[[German nouns]] inflect into:@@@@1@4@@danf@17-8-2009 10301730@unknown@formal@none@1@S@* one of four [[Grammatical case|case]]s: [[nominative]], [[genitive]], [[dative case|dative]], and [[accusative case|accusative]].@@@@1@13@@danf@17-8-2009 10301740@unknown@formal@none@1@S@* one of three [[grammatical gender|genders]]: masculine, feminine, or neuter.@@@@1@10@@danf@17-8-2009 10301750@unknown@formal@none@1@S@Word endings sometimes reveal grammatical gender; for instance, nouns ending in '''...ung'''([[-ing]]), '''...e''','''...schaft'''([[-ship]]), '''...keit''' or '''...heit'''([[-hood]]) are feminine, while nouns ending in '''...chen''' or '''...lein''' ([[diminutive]] forms) are neuter and nouns ending in '''...ismus ([[-ism]])''' are masculine.@@@@1@37@@danf@17-8-2009 10301760@unknown@formal@none@1@S@Others are controversial, sometimes depending on the region in which it is spoken.@@@@1@13@@danf@17-8-2009 10301770@unknown@formal@none@1@S@Additionally, ambiguous endings exist, such as '''...er''' ([[-er]]), e.g. ''Feier (feminine)'', engl. ''celebration, party'', and ''Arbeiter (masculine)'', engl. ''labourer''.@@@@1@19@@danf@17-8-2009 10301780@unknown@formal@none@1@S@Sentences can usually be reorganized to avoid a misunderstanding.@@@@1@9@@danf@17-8-2009 10301790@unknown@formal@none@1@S@* two numbers: singular and plural@@@@1@6@@danf@17-8-2009 10301800@unknown@formal@none@1@S@Although German is usually cited as an outstanding example of a highly inflected language, the degree of inflection is considerably less than in [[Old German]], or in other old [[Indo-European languages]] such as [[Latin]], [[Ancient Greek]], or [[Sanskrit]].@@@@1@38@@danf@17-8-2009 10301810@unknown@formal@none@1@S@The three genders have collapsed in the plural, which now behaves, grammatically, somewhat as a fourth gender.@@@@1@17@@danf@17-8-2009 10301820@unknown@formal@none@1@S@With four cases and three genders plus plural there are 16 distinct possible combinations of case and gender/number, but presently there are only six forms of the [[Article (grammar)|definite article]] used for the 16 possibilities.@@@@1@35@@danf@17-8-2009 10301830@unknown@formal@none@1@S@Inflection for case on the noun itself is required in the singular for strong masculine and neuter nouns in the genitive and sometimes in the dative.@@@@1@26@@danf@17-8-2009 10301840@unknown@formal@none@1@S@Both of these cases are losing way to substitutes in [[Natural language|informal speech]].@@@@1@13@@danf@17-8-2009 10301850@unknown@formal@none@1@S@The dative ending is considered somewhat old-fashioned in many contexts and often dropped, but it is still used in sayings and in formal speech or in written language.@@@@1@28@@danf@17-8-2009 10301860@unknown@formal@none@1@S@Weak masculine nouns share a common case ending for genitive, dative and accusative in the singular.@@@@1@16@@danf@17-8-2009 10301870@unknown@formal@none@1@S@Feminines are not declined in the singular.@@@@1@7@@danf@17-8-2009 10301880@unknown@formal@none@1@S@The plural does have an inflection for the dative.@@@@1@9@@danf@17-8-2009 10301890@unknown@formal@none@1@S@In total, seven inflectional endings (not counting plural markers) exist in German: ''-s, -es, -n, -ns, -en, -ens, -e''.@@@@1@19@@danf@17-8-2009 10301900@unknown@formal@none@1@S@In the German orthography, nouns and most words with the syntactical function of nouns are capitalised, which is supposed to make it easier for readers to find out what function a word has within the sentence (''Am Freitag bin ich einkaufen gegangen.'' — "On Friday I went shopping."; ''Eines Tages war er endlich da.'' — "One day he finally showed up".)@@@@1@61@@danf@17-8-2009 10301910@unknown@formal@none@1@S@This spelling convention is almost unique to German today (shared perhaps only by the closely related [[Luxemburgish language]]), although it was historically common in other languages (e.g., Danish and English), too.@@@@1@31@@danf@17-8-2009 10301920@unknown@formal@none@1@S@Like most Germanic languages, German forms left-branching noun [[compound (linguistics)|compound]]s, where the first noun modifies the category given by the second, for example: ''Hundehütte'' (eng. ''dog hut''; specifically: ''doghouse'').@@@@1@29@@danf@17-8-2009 10301930@unknown@formal@none@1@S@Unlike English, where newer compounds or combinations of longer nouns are often written in ''open'' form with separating spaces, German (like the other German languages) nearly always uses the ''closed'' form without spaces, for example: Baumhaus (eng. ''tree house'').@@@@1@39@@danf@17-8-2009 10301940@unknown@formal@none@1@S@Like English, German allows arbitrarily long compounds, but these are rare.@@@@1@11@@danf@17-8-2009 10301950@unknown@formal@none@1@S@(''See also'' [[English compounds]].)@@@@1@4@@danf@17-8-2009 10301960@unknown@formal@none@1@S@The longest German word verified to be actually in (albeit very limited) use is [[Rinderkennzeichnungs- und Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz|Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz]]. [which, literally translated, breaks up into: Rind (cattle) - Fleisch (meat) - Etikettierung(s) (labelling) - Überwachung(s) (supervision) - Aufgaben (duties) - Übertragung(s) (assignment) - Gesetz (law), so "Beef labelling supervision duty assignment law".]@@@@1@50@@danf@17-8-2009 10301970@unknown@formal@none@1@S@===Verb inflection===@@@@1@2@@danf@17-8-2009 10301980@unknown@formal@none@1@S@Standard German verbs inflect into:@@@@1@5@@danf@17-8-2009 10301990@unknown@formal@none@1@S@* one of two conjugation classes, [[weak verb|weak]] and [[strong verb|strong]] (like English).@@@@1@13@@danf@17-8-2009 10302000@unknown@formal@none@1@S@(There is actually a third class, known as mixed verbs, which exhibit inflections combining features of both the strong and weak patterns.)@@@@1@22@@danf@17-8-2009 10302010@unknown@formal@none@1@S@* three persons: 1st, 2nd, 3rd.@@@@1@6@@danf@17-8-2009 10302020@unknown@formal@none@1@S@* two numbers: singular and plural@@@@1@6@@danf@17-8-2009 10302030@unknown@formal@none@1@S@* three [[Grammatical mood|mood]]s: Indicative, Subjunctive, Imperative@@@@1@7@@danf@17-8-2009 10302040@unknown@formal@none@1@S@* two [[Grammatical voice|genera verbi]]: active and passive; the passive being composed and dividable into static and dynamic.@@@@1@18@@danf@17-8-2009 10302050@unknown@formal@none@1@S@* two non-composed tenses ([[present tense|present]], [[preterite]]) and four composed tenses ([[perfect tense|perfect]], [[pluperfect]], [[Future tense|future]] and [[Future perfect tense|future perfect]])@@@@1@21@@danf@17-8-2009 10302060@unknown@formal@none@1@S@* distinction between [[grammatical aspect]]s is rendered by combined use of subjunctive and/or preterite marking; thus: neither of both is plain indicative voice, sole subjunctive conveys second-hand information, subjunctive plus Preterite marking forms the conditional state, and sole preterite is either plain indicative (in the past), or functions as a (literal) alternative for either second-hand-information or for the conditional state of the verb, when one of them may seem indistinguishable otherwise.@@@@1@71@@danf@17-8-2009 10302070@unknown@formal@none@1@S@* distinction between perfect and [[Continuous and progressive aspects|progressive aspect]] is and has at every stage of development been at hand as a productive category of the older language and in nearly all documented dialects, but, strangely enough, is nowadays rigorously excluded from written usage in its present normalised form.@@@@1@50@@danf@17-8-2009 10302080@unknown@formal@none@1@S@* disambiguation of completed vs. uncompleted forms is widely observed and regularly generated by common prefixes (blicken - to look, erblicken - to see [unrelated form: sehen - to see]).@@@@1@30@@danf@17-8-2009 10302090@unknown@formal@none@1@S@====Verb prefixes====@@@@1@2@@danf@17-8-2009 10302100@unknown@formal@none@1@S@There are also many ways to expand, and sometimes radically change, the meaning of a base verb through a relatively small number of prefixes.@@@@1@24@@danf@17-8-2009 10302110@unknown@formal@none@1@S@Some of those prefixes have a meaning themselves (Example: zer- refers to the destruction of things, as in zerreißen = to tear apart, zerbrechen = to break apart, zerschneiden = to cut apart), others do not have more than the vaguest meaning in and of themselves (Example: ver- , as in versuchen = to try, vernehmen = to interrogate, verteilen = to distribute, verstehen = to understand).@@@@1@53@@danf@17-8-2009 10302120@unknown@formal@none@1@S@More examples: haften = to stick, verhaften = to imprison; kaufen = to buy, verkaufen = to sell; hören = to hear, aufhören = to cease; fahren = to drive, erfahren = to get to know, to hear about something.@@@@1@24@@danf@17-8-2009 10302130@unknown@formal@none@1@S@=====Separable prefixes=====@@@@1@2@@danf@17-8-2009 10302140@unknown@formal@none@1@S@Many [[German verbs]] have a separable prefix, often with an adverbial function.@@@@1@12@@danf@17-8-2009 10302150@unknown@formal@none@1@S@In [[finite verb]] forms this is split off and moved to the end of the clause, and is hence considered by some to be a "resultative particle".@@@@1@27@@danf@17-8-2009 10302160@unknown@formal@none@1@S@For example, ''mitgehen'' meaning "to go with" would be split giving ''Gehen Sie mit?''@@@@1@14@@danf@17-8-2009 10302170@unknown@formal@none@1@S@(Literal: "Go you with?" ; Formal: "Are you going along"?).@@@@1@10@@danf@17-8-2009 10302180@unknown@formal@none@1@S@Indeed, several [[parenthetic]]al clauses may occur between the prefix of a finite verb and its complement; e.g.@@@@1@17@@danf@17-8-2009 10302190@unknown@formal@none@1@S@:''Er '''kam''' am Freitagabend nach einem harten Arbeitstag und dem üblichen Ärger, der ihn schon seit Jahren immer wieder an seinem Arbeitsplatz plagt, mit fraglicher Freude auf ein Mahl, das seine Frau ihm, wie er hoffte, bereits aufgetischt hatte, endlich zu Hause '''an''' ''.@@@@1@44@@danf@17-8-2009 10302200@unknown@formal@none@1@S@A literal translation of this example might look like this:@@@@1@10@@danf@17-8-2009 10302210@unknown@formal@none@1@S@:He '''arr-''' on a Friday evening after a hard day at work and the usual disagreements that had been troubling him repeatedly, looking forward to a questionable meal which, as he hoped, his wife had already fixed for him, '''-ived''' at home.@@@@1@42@@danf@17-8-2009 10302220@unknown@formal@none@1@S@===Word order===@@@@1@2@@danf@17-8-2009 10302230@unknown@formal@none@1@S@German requires that a verbal element (main verb or [[auxiliary verb]]) appear second in the sentence, preceded by the most important topical phrase.@@@@1@23@@danf@17-8-2009 10302240@unknown@formal@none@1@S@The second most important phrase appears at the end of the sentence.@@@@1@12@@danf@17-8-2009 10302250@unknown@formal@none@1@S@For a sentence without an auxiliary, this gives several options:@@@@1@10@@danf@17-8-2009 10302260@unknown@formal@none@1@S@: ''{{lang|de|Der alte Mann gibt mir das Buch heute.}}''@@@@1@9@@danf@17-8-2009 10302265@unknown@formal@none@1@S@(The old man gives me the book today)@@@@1@8@@danf@17-8-2009 10302270@unknown@formal@none@1@S@: ''{{lang|de|Der alte Mann gibt mir heute das Buch.}}''@@@@1@9@@danf@17-8-2009 10302280@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt mir der alte Mann heute.}}''@@@@1@9@@danf@17-8-2009 10302290@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt der alte Mann heute mir.}}'' ([[stress (linguistics)|stress]] on ''mir'')@@@@1@13@@danf@17-8-2009 10302300@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt heute der alte Mann mir.}}'' (as well)@@@@1@11@@danf@17-8-2009 10302310@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt der alte Mann mir heute.}}''@@@@1@9@@danf@17-8-2009 10302320@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt heute mir der alte Mann.}}''@@@@1@9@@danf@17-8-2009 10302330@unknown@formal@none@1@S@: ''{{lang|de|Das Buch gibt mir heute der alte Mann.}}''@@@@1@9@@danf@17-8-2009 10302340@unknown@formal@none@1@S@: ''{{lang|de|Heute gibt mir der alte Mann das Buch.}}''@@@@1@9@@danf@17-8-2009 10302350@unknown@formal@none@1@S@: ''{{lang|de|Heute gibt mir das Buch der alte Mann.}}''@@@@1@9@@danf@17-8-2009 10302360@unknown@formal@none@1@S@: ''{{lang|de|Heute gibt der alte Mann mir das Buch.}}''@@@@1@9@@danf@17-8-2009 10302370@unknown@formal@none@1@S@: ''{{lang|de|Mir gibt der alte Mann das Buch heute.}}''@@@@1@9@@danf@17-8-2009 10302380@unknown@formal@none@1@S@: ''{{lang|de|Mir gibt heute der alte Mann das Buch.}}''@@@@1@9@@danf@17-8-2009 10302390@unknown@formal@none@1@S@: ''{{lang|de|Mir gibt der alte Mann heute das Buch.}}''@@@@1@9@@danf@17-8-2009 10302400@unknown@formal@none@1@S@The position of a noun as a subject or object in a German sentence doesn't affect the meaning of the sentence as it would in English.@@@@1@26@@danf@17-8-2009 10302410@unknown@formal@none@1@S@In a [[Sentence (linguistics)|declarative sentence]] in English if the subject does not occur before the predicate the sentence could well be misunderstood.@@@@1@22@@danf@17-8-2009 10302420@unknown@formal@none@1@S@For example, in the sentence "Man bites dog" it is clear who did what to whom.@@@@1@16@@danf@17-8-2009 10302430@unknown@formal@none@1@S@To exchange the place of the subject with that of the object — "Dog bites man" — changes the meaning completely.@@@@1@21@@danf@17-8-2009 10302440@unknown@formal@none@1@S@In other words the word order in a sentence conveys significant information.@@@@1@12@@danf@17-8-2009 10302450@unknown@formal@none@1@S@In German, nouns and articles are declined as in Latin thus indicating whether it is the [[subject (linguistics)|subject]] or [[object (linguistics)|object]] of the verb's action.@@@@1@25@@danf@17-8-2009 10302460@unknown@formal@none@1@S@The above example in German would be ''{{lang|de|Ein Mann beißt den Hund}}'' or ''{{lang|de|Den Hund beißt ein Mann}}'' with both having exactly the same meaning.@@@@1@25@@danf@17-8-2009 10302470@unknown@formal@none@1@S@If the articles are omitted, which is sometimes done in headlines (''{{lang|de|Mann beißt Hund}}''), the syntax applies as in English — the first noun is the subject and the noun following the predicate is the object.@@@@1@36@@danf@17-8-2009 10302480@unknown@formal@none@1@S@Except for emphasis, adverbs of time have to appear in the third place in the sentence, just after the predicate.@@@@1@20@@danf@17-8-2009 10302490@unknown@formal@none@1@S@Otherwise the speaker would be recognised as non-German.@@@@1@8@@danf@17-8-2009 10302500@unknown@formal@none@1@S@For instance the German word order (in Modern English) is: We're going tomorrow to town. (''{{lang|de|Wir gehen morgen in die Stadt.}}'')@@@@1@21@@danf@17-8-2009 10302510@unknown@formal@none@1@S@====Auxiliary verbs====@@@@1@2@@danf@17-8-2009 10302520@unknown@formal@none@1@S@When an [[auxiliary verb]] is present, the auxiliary appears in second position, and the main verb appears at the end.@@@@1@20@@danf@17-8-2009 10302530@unknown@formal@none@1@S@This occurs notably in the creation of the [[perfect tense]].@@@@1@10@@danf@17-8-2009 10302540@unknown@formal@none@1@S@Many word orders are still possible, e.g.:@@@@1@7@@danf@17-8-2009 10302550@unknown@formal@none@1@S@:''{{lang|de|Der alte Mann hat mir das Buch gestern gegeben.}}''@@@@1@9@@danf@17-8-2009 10302555@unknown@formal@none@1@S@(The old man gave me the book yesterday.)@@@@1@8@@danf@17-8-2009 10302560@unknown@formal@none@1@S@:''{{lang|de|Der alte Mann hat mir gestern das Buch gegeben.}}''@@@@1@9@@danf@17-8-2009 10302570@unknown@formal@none@1@S@:''{{lang|de|Das Buch hat mir der alte Mann gestern gegeben.}}''@@@@1@9@@danf@17-8-2009 10302580@unknown@formal@none@1@S@:''{{lang|de|Das Buch hat mir gestern der alte Mann gegeben.}}''@@@@1@9@@danf@17-8-2009 10302590@unknown@formal@none@1@S@:''{{lang|de|Gestern hat mir der alte Mann das Buch gegeben.}}''@@@@1@9@@danf@17-8-2009 10302600@unknown@formal@none@1@S@:''{{lang|de|Gestern hat mir das Buch der alte Mann gegeben.}}''@@@@1@9@@danf@17-8-2009 10302610@unknown@formal@none@1@S@The word order is generally less rigid than in Modern English except for nouns (see below).@@@@1@16@@danf@17-8-2009 10302620@unknown@formal@none@1@S@There are two common [[word order]]s; one is for main [[clause]]s and another for [[subordinate clause]]s.@@@@1@16@@danf@17-8-2009 10302630@unknown@formal@none@1@S@In normal positive sentences the ''inflected'' verb always has position 2; in questions, exclamations and wishes it always has position 1.@@@@1@21@@danf@17-8-2009 10302640@unknown@formal@none@1@S@In subordinate clauses the verb is supposed to occur at the very end, but in speech this rule is often disregarded.@@@@1@21@@danf@17-8-2009 10302650@unknown@formal@none@1@S@For example in a [[Dependent clause|subordinate clause]] introduced by "weil" ("because") the verb quite often occupies the same order as in a [[Independent clause|main clause]].@@@@1@25@@danf@17-8-2009 10302660@unknown@formal@none@1@S@The correct way of saying "because I'm broke" is ''"{{lang|de|…weil ich pleite bin.}}"''.@@@@1@13@@danf@17-8-2009 10302670@unknown@formal@none@1@S@In the vernacular you may hear instead ''"{{lang|de|…weil ich bin pleite.}}"''@@@@1@11@@danf@17-8-2009 10302675@unknown@formal@none@1@S@This phenomenon may be caused by mixing the word-order pattern used for the word ''{{lang|de|weil}}'' with the pattern used for an alternative word for "because", ''{{lang|de|denn}}'', which is used with the main clause order (''"{{lang|de|…denn ich bin pleite.}}"'').@@@@1@38@@danf@17-8-2009 10302680@unknown@formal@none@1@S@====Modal verbs====@@@@1@2@@danf@17-8-2009 10302690@unknown@formal@none@1@S@Sentences using modal verbs place the infinitive at the end.@@@@1@10@@danf@17-8-2009 10302700@unknown@formal@none@1@S@For example, the sentence in Modern English "Should he go home?" would be rearranged in German to say "Should he (to) home go?" (''{{lang|de|Soll er nach Hause gehen?}}'').@@@@1@28@@danf@17-8-2009 10302710@unknown@formal@none@1@S@Thus in sentences with several subordinate or relative clauses the infinitives are clustered at the end.@@@@1@16@@danf@17-8-2009 10302720@unknown@formal@none@1@S@Compare the similar clustering of prepositions in the following English sentence: "What did you bring that book that I don't like to be read to out of up for?"@@@@1@29@@danf@17-8-2009 10302730@unknown@formal@none@1@S@====Multiple infinitives====@@@@1@2@@danf@17-8-2009 10302740@unknown@formal@none@1@S@The number of infinitives at the end is usually restricted to two, causing the third infinitive or auxiliary verb that would have gone at the very end to be placed instead at the beginning of the chain of verbs.@@@@1@39@@danf@17-8-2009 10302750@unknown@formal@none@1@S@For example in the sentence "Should he move into the house that he just has had renovated?" would be rearranged to "Should he into the house move, that he just renovated had?".@@@@1@32@@danf@17-8-2009 10302755@unknown@formal@none@1@S@(''{{lang|de|Soll er in das Haus einziehen, das er gerade hat renovieren lassen?}}'').@@@@1@12@@danf@17-8-2009 10302760@unknown@formal@none@1@S@The older form would have been (''{{lang|de|Soll er in das Haus, das er gerade hat renovieren lassen, einziehen?}}'').@@@@1@18@@danf@17-8-2009 10302770@unknown@formal@none@1@S@If there are more than three infinitives, all except the first two are relocated to the beginning of the chain.@@@@1@20@@danf@17-8-2009 10302780@unknown@formal@none@1@S@Needless to say the rule is not rigorously applied.@@@@1@9@@danf@17-8-2009 10302790@unknown@formal@none@1@S@==Vocabulary==@@@@1@1@@danf@17-8-2009 10302800@unknown@formal@none@1@S@Most German vocabulary is derived from the Germanic branch of the Indo-European language family, although there are significant minorities of words derived from Latin, and [[Greek language|Greek]], and a smaller amount from French and most recently English .@@@@1@38@@danf@17-8-2009 10302810@unknown@formal@none@1@S@At the same time, the effectiveness of the German language in forming equivalents for foreign words from its inherited Germanic stem repertory is great.@@@@1@24@@danf@17-8-2009 10302820@unknown@formal@none@1@S@Thus, [[Notker Labeo]] was able to translate Aristotelian treatises in pure (Old High) German in the decades after the year 1000.@@@@1@21@@danf@17-8-2009 10302830@unknown@formal@none@1@S@Overall, German has fewer Romance-language loanwords than does English.@@@@1@9@@danf@17-8-2009 10302840@unknown@formal@none@1@S@The coining of new, autochthonous words gave German a vocabulary of an estimated 40,000 words as early as the ninth century.@@@@1@21@@danf@17-8-2009 10302850@unknown@formal@none@1@S@In comparison, Latin, with a written tradition of nearly 2,500 years in an empire which ruled the Mediterranean, has grown to no more than 45,000 words today.@@@@1@27@@danf@17-8-2009 10302860@unknown@formal@none@1@S@Even today, many low-key scholarly movements try to promote the ''[[Ersatz]]'' (substitution) of virtually all foreign words with ancient, dialectal, or [[neologism|neologous]] German alternatives.@@@@1@24@@danf@17-8-2009 10302870@unknown@formal@none@1@S@It is claimed that this would also help in spreading modern or scientific notions among the less educated, and thus democratise public life, too.@@@@1@24@@danf@17-8-2009 10302880@unknown@formal@none@1@S@Jurisprudence in Germany, for example, uses perhaps the "purest" tongue in terms of "Germanness", but also the most cumbersome, to be found today..@@@@1@23@@danf@17-8-2009 10302890@unknown@formal@none@1@S@In the modern scientific German vocabulary data base in Leipzig (as of July 2003) there are nine million words and word groups in 35 million sentences (out of a corpus of 500 million words).@@@@1@34@@danf@17-8-2009 10302900@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10302910@unknown@formal@none@1@S@=== Present ===@@@@1@3@@danf@17-8-2009 10302920@unknown@formal@none@1@S@German is written using the Latin alphabet.@@@@1@7@@danf@17-8-2009 10302930@unknown@formal@none@1@S@In addition to the 26 standard letters, German has three vowels with [[Umlaut (diacritic)|Umlaut]], namely ''ä'', ''ö'' and ''ü'', as well as the Eszett or ''[[scharfes s]]'' (sharp s), ''[[ß]]''.@@@@1@30@@danf@17-8-2009 10302940@unknown@formal@none@1@S@Before the German spelling reform of 1996, ''ß'' replaced ''ss'' after [[Vowel length|long vowels]] and diphthongs and before consonants, word-, or partial-word-endings.@@@@1@22@@danf@17-8-2009 10302950@unknown@formal@none@1@S@In reformed spelling, ''ß'' replaces ''ss'' only after long vowels and diphthongs.@@@@1@12@@danf@17-8-2009 10302960@unknown@formal@none@1@S@Since there is no [[capital ß]], it is always written as SS when capitalization is required.@@@@1@16@@danf@17-8-2009 10302970@unknown@formal@none@1@S@For example, ''Maßband'' (tape measure) is capitalized ''MASSBAND''.@@@@1@8@@danf@17-8-2009 10302980@unknown@formal@none@1@S@An exception is the use of ß in legal documents and forms when capitalizing names.@@@@1@15@@danf@17-8-2009 10302990@unknown@formal@none@1@S@To avoid confusion with similar names, a "ß" is to be used instead of "SS".@@@@1@15@@danf@17-8-2009 10303000@unknown@formal@none@1@S@(So: "KREßLEIN" instead of "KRESSLEIN".)@@@@1@5@@danf@17-8-2009 10303010@unknown@formal@none@1@S@A capital ß has been proposed and included in [[Unicode]], but it is not yet recognized as standard German.@@@@1@19@@danf@17-8-2009 10303020@unknown@formal@none@1@S@In [[Switzerland]], ß is not used at all.@@@@1@8@@danf@17-8-2009 10303030@unknown@formal@none@1@S@Umlaut vowels (ä, ö, ü) are commonly circumscribed with ae, oe, and ue if the umlauts are not available on the keyboard used.@@@@1@23@@danf@17-8-2009 10303040@unknown@formal@none@1@S@In the same manner ß can be circumscribed as ss. German readers understand those circumscriptions (although they look unusual), but they are avoided if the regular umlauts are available because they are considered a makeshift, not proper spelling.@@@@1@38@@danf@17-8-2009 10303050@unknown@formal@none@1@S@(In Westphalia, city and family names exist where the extra e has a vowel lengthening effect, e.g. ''Raesfeld'' [ˈraːsfɛlt] and ''Coesfeld'' [ˈkoːsfɛlt], but this use of the letter e after a/o/u does not occur in the present-day spelling of words other than [[proper noun]]s.@@@@1@44@@danf@17-8-2009 10303060@unknown@formal@none@1@S@)@@@@1@1@@danf@17-8-2009 10303070@unknown@formal@none@1@S@Unfortunately there is still no general agreement exactly where these umlauts occur in the sorting sequence.@@@@1@16@@danf@17-8-2009 10303080@unknown@formal@none@1@S@Telephone directories treat them by replacing them with the base vowel followed by an e, whereas dictionaries use just the base vowel.@@@@1@22@@danf@17-8-2009 10303090@unknown@formal@none@1@S@As an example in a [[Telephone directory|telephone book]] ''Ärzte'' occurs after ''Adressenverlage'' but before ''Anlagenbauer'' (because Ä is replaced by Ae).@@@@1@21@@danf@17-8-2009 10303100@unknown@formal@none@1@S@In a dictionary ''Ärzte'' occurs after ''Arzt'' but before ''Asbest'' (because Ä is treated as A).@@@@1@16@@danf@17-8-2009 10303110@unknown@formal@none@1@S@In some older dictionaries or indexes, initial ''Sch'' and ''St'' are treated as separate letters and are listed as separate entries after ''S''.@@@@1@23@@danf@17-8-2009 10303120@unknown@formal@none@1@S@=== Past ===@@@@1@3@@danf@17-8-2009 10303130@unknown@formal@none@1@S@Until the early 20th century, German was mostly printed in [[blackletter]] [[typefaces]] (mostly in [[fraktur (typeface)|Fraktur]], but also in [[Schwabacher]]) and written in corresponding [[Penmanship|handwriting]] (for example [[Kurrent]] and [[Sütterlin]]).@@@@1@30@@danf@17-8-2009 10303140@unknown@formal@none@1@S@These variants of the Latin alphabet are very different from the serif or [[Sans-serif|sans serif]] [[Antiqua]] typefaces used today, and particularly the handwritten forms are difficult for the untrained to read.@@@@1@31@@danf@17-8-2009 10303150@unknown@formal@none@1@S@The printed forms however were claimed by some to be actually more readable when used for printing [[Germanic language]]s .@@@@1@20@@danf@17-8-2009 10303160@unknown@formal@none@1@S@The [[Nazis]] initially promoted Fraktur and Schwabacher since they were considered [[Aryan]], although they later abolished them in 1941 by claiming that these letters were Jewish.@@@@1@26@@danf@17-8-2009 10303170@unknown@formal@none@1@S@The latter fact is not widely known anymore; today the letters are often associated with the Nazis and are no longer commonly used .@@@@1@24@@danf@17-8-2009 10303180@unknown@formal@none@1@S@The Fraktur script remains present in everyday life through road signs, pub signs, beer brands and other forms of advertisement, where it is used to convey a certain rusticality and oldness.@@@@1@31@@danf@17-8-2009 10303190@unknown@formal@none@1@S@A proper use of the [[long s]], (''langes s''), [[Long s|ſ]], is essential to write German text in [[Fraktur (script)|Fraktur]] typefaces.@@@@1@21@@danf@17-8-2009 10303200@unknown@formal@none@1@S@Many [[Antiqua script|Antiqua]] typefaces include the [[long s]], also.@@@@1@9@@danf@17-8-2009 10303210@unknown@formal@none@1@S@A specific set of rules applies for the use of long s in German text, but it is rarely used in Antiqua typesetting, recently.@@@@1@24@@danf@17-8-2009 10303220@unknown@formal@none@1@S@Any lower case "s" at the beginning of a syllable would be a long s, as opposed to a terminal s or short s (the more common variation of the letter s), which marks the end of a syllable; for example, in differentiating between the words ''Wachſtube'' (=guard-house) and ''Wachstube'' (=tube of floor polish).@@@@1@54@@danf@17-8-2009 10303230@unknown@formal@none@1@S@One can decide which "s" to use by appropriate hyphenation, easily ("Wach-ſtube" vs. "Wachs-tube").@@@@1@14@@danf@17-8-2009 10303240@unknown@formal@none@1@S@The long s only appears in [[lower case]].@@@@1@8@@danf@17-8-2009 10303250@unknown@formal@none@1@S@The widespread ignorance of the correct use of the Fraktur scripts shows however in the many mistakes made— such as the frequent erroneous use of the round s instead of the [[long s]] at the beginning of a syllable, the failure to employ the mandatory [[Typographical ligature|ligature]]s of Fraktur, or the use of letter-forms more alike to the Antiqua for certain especially hard-to-read Fraktur letters.@@@@1@65@@danf@17-8-2009 10303260@unknown@formal@none@1@S@==Phonology==@@@@1@1@@danf@17-8-2009 10303270@unknown@formal@none@1@S@===Vowels===@@@@1@1@@danf@17-8-2009 10303280@unknown@formal@none@1@S@German vowels (excluding diphthongs; see below) come in ''short'' and ''long'' varieties, as detailed in the following table:@@@@1@18@@danf@17-8-2009 10303290@unknown@formal@none@1@S@Short {{IPA|/ɛ/}} is realised as {{IPA|[ɛ]}} in stressed syllables (including [[secondary stress]]), but as {{IPA|[ǝ]}} in unstressed syllables.@@@@1@18@@danf@17-8-2009 10303300@unknown@formal@none@1@S@Note that stressed short {{IPA|/ɛ/}} can be spelled either with ''e'' or with ''ä'' (''hätte'' 'would have' and ''Kette'' 'chain', for instance, rhyme).@@@@1@23@@danf@17-8-2009 10303310@unknown@formal@none@1@S@In general, the short vowels are open and the long vowels are closed.@@@@1@13@@danf@17-8-2009 10303320@unknown@formal@none@1@S@The one exception is the open {{IPA|/ɛː/}} sound of long Ä; in some varieties of standard German, {{IPA|/ɛː/}} and {{IPA|/eː/}} have merged into {{IPA|[eː]}}, removing this anomaly.@@@@1@27@@danf@17-8-2009 10303330@unknown@formal@none@1@S@In that case, pairs like ''Bären/Beeren'' 'bears/berries' or ''Ähre/Ehre'' 'spike/honour' become homophonous).@@@@1@12@@danf@17-8-2009 10303340@unknown@formal@none@1@S@In many varieties of standard German, an unstressed {{IPA|/ɛr/}} is not pronounced as {{IPA|[ər]}}, but vocalised to {{IPA|[ɐ]}}.@@@@1@18@@danf@17-8-2009 10303350@unknown@formal@none@1@S@Whether any particular vowel letter represents the long or short phoneme is not completely predictable, although the following regularities exist:@@@@1@20@@danf@17-8-2009 10303360@unknown@formal@none@1@S@* If a vowel (other than ''i'') is at the end of a syllable or followed by a single consonant, it is usually pronounced long (e.g. ''Hof'' [hoːf]).@@@@1@28@@danf@17-8-2009 10303370@unknown@formal@none@1@S@* If the vowel is followed by a double consonant (e.g. ''ff'', ''ss'' or ''tt''), ''ck'', ''tz'' or a [[consonant cluster]] (e.g. ''st'' or ''nd''), it is nearly always short (e.g. ''hoffen'' [ˈhɔfǝn]).@@@@1@33@@danf@17-8-2009 10303380@unknown@formal@none@1@S@Double consonants are used only for this function of marking preciding vowels as short; the consonant itself is never pronounced lengthened or doubled.@@@@1@23@@danf@17-8-2009 10303390@unknown@formal@none@1@S@Both of these rules have exceptions (e.g. ''hat'' [hat] 'has' is short despite the first rule; ''Kloster'' {{IPA|[kloːstər]}}, '[[cloister]]'; ''Mond'' {{IPA|[moːnt]}}, '[[moon]]' are long despite the second rule).@@@@1@28@@danf@17-8-2009 10303400@unknown@formal@none@1@S@For an ''i'' that is neither in the combination ''ie'' (making it long) nor followed by a double consonant or cluster (making it short), there is no general rule.@@@@1@29@@danf@17-8-2009 10303410@unknown@formal@none@1@S@In some cases, there are regional differences: In central Germany (Hessen), the ''o'' in the [[Noun#Proper nouns and common nouns|proper name]] "Hoffmann" is pronounced long while most other Germans would pronounce it short; the same applies to the ''e'' in the geographical name "Mecklenburg" for people in that region.@@@@1@49@@danf@17-8-2009 10303420@unknown@formal@none@1@S@The word ''Städte'' 'cities', is pronounced with a short vowel {{IPA|[ˈʃtɛtə]}} by some (Jan Hofer, ARD Television) and with a long vowel {{IPA|[ˈʃtɛːtə]}} by others (Marietta Slomka, ZDF Television).@@@@1@29@@danf@17-8-2009 10303430@unknown@formal@none@1@S@Finally, a vowel followed by ''ch'' can be short (''Fach'' {{IPA|[fax]}} 'compartment', ''Küche'' {{IPA|[ˈkʏçe]}} 'kitchen') or long (''Suche'' {{IPA|[ˈzuːxǝ]}} 'search', ''Bücher'' {{IPA|[ˈbyːçər]}} 'books') almost at random.@@@@1@26@@danf@17-8-2009 10303440@unknown@formal@none@1@S@Thus, ''Lache'' is homographous: {{IPA|[la:xe]}} 'puddle' and {{IPA|[laxe]}} 'manner of laughing' (coll.), 'laugh!'@@@@1@13@@danf@17-8-2009 10303450@unknown@formal@none@1@S@(Imp.).@@@@1@1@@danf@17-8-2009 10303460@unknown@formal@none@1@S@German vowels can form the following digraphs (in writing) and diphthongs (in pronunciation); note that the pronunciation of some of them (ei, äu, eu) is very different from what one would expect when considering the component letters:@@@@1@37@@danf@17-8-2009 10303470@unknown@formal@none@1@S@Additionally, the digraph ''ie'' generally represents the phoneme {{IPA|/iː/}}, which is not a diphthong.@@@@1@14@@danf@17-8-2009 10303480@unknown@formal@none@1@S@In many varieties, a /r/ at the end of a syllable is vocalised.@@@@1@13@@danf@17-8-2009 10303490@unknown@formal@none@1@S@However, a sequence of a vowel followed by such a vocalised /r/ is not considered a diphthong: Bär {{IPA|[bɛːɐ̯]}} 'bear', er {{IPA|[eːɐ̯]}} 'he', wir {{IPA|[viːɐ̯]}} 'we', Tor {{IPA|[toːɐ̯]}} 'gate', kurz {{IPA|[kʊɐ̯ts]}} 'short', Wörter {{IPA|[vœɐ̯tɐ]}} 'words'.@@@@1@35@@danf@17-8-2009 10303500@unknown@formal@none@1@S@In most varieties of standard German, word stems that begin with a vowel are preceded by a [[glottal stop]] [ʔ].@@@@1@20@@danf@17-8-2009 10303510@unknown@formal@none@1@S@===Consonants===@@@@1@1@@danf@17-8-2009 10303520@unknown@formal@none@1@S@* '''c''' standing by itself is not a German letter.@@@@1@10@@danf@17-8-2009 10303530@unknown@formal@none@1@S@In borrowed words, it is usually pronounced [ʦ] (before ä, äu, e, i, ö, ü, y) or [k] (before a, o, u, or before consonants).@@@@1@25@@danf@17-8-2009 10303540@unknown@formal@none@1@S@The combination '''ck''' is, as in English, used to indicate that the preceding vowel is short.@@@@1@16@@danf@17-8-2009 10303550@unknown@formal@none@1@S@* '''ch''' occurs most often and is pronounced either [ç] (after ä, ai, äu, e, ei, eu, i, ö, ü and after consonants) or [x] (after a, au, o, u).@@@@1@30@@danf@17-8-2009 10303560@unknown@formal@none@1@S@Ch never occurs at the beginning of an originally German word.@@@@1@11@@danf@17-8-2009 10303570@unknown@formal@none@1@S@In borrowed words with initial Ch there is no single agreement on the pronunciation.@@@@1@14@@danf@17-8-2009 10303580@unknown@formal@none@1@S@For example, the word ''"Chemie"'' (chemistry) can be pronounced [keːˈmiː], [çeːˈmiː] or [ʃeːˈmiː] depending on dialect.@@@@1@16@@danf@17-8-2009 10303590@unknown@formal@none@1@S@* '''dsch''' is pronounced ʤ (like ''j'' in ''Jungle'') but appears in a few [[loanwords]] only.@@@@1@16@@danf@17-8-2009 10303600@unknown@formal@none@1@S@* '''f''' is pronounced [f] as in "''f''ather".@@@@1@8@@danf@17-8-2009 10303610@unknown@formal@none@1@S@* '''h''' is pronounced [h] like in "''h''ome" at the beginning of a syllable.@@@@1@14@@danf@17-8-2009 10303620@unknown@formal@none@1@S@After a vowel it is silent and only lengthens the vowel (e.g. ''"Reh"'' = [[roe deer]]).@@@@1@16@@danf@17-8-2009 10303630@unknown@formal@none@1@S@* '''j''' is pronounced [j] in Germanic words (''"Jahr"'' [jaːɐ]).@@@@1@10@@danf@17-8-2009 10303640@unknown@formal@none@1@S@In younger loanwords, it follows more or less the respective languages' pronunciations.@@@@1@12@@danf@17-8-2009 10303650@unknown@formal@none@1@S@* '''l''' is always pronounced [l], never [ɫ] (the English "[[Dark L]]").@@@@1@12@@danf@17-8-2009 10303660@unknown@formal@none@1@S@* '''q''' only exists in combination with '''u''' and appears both in Germanic and Latin words (''"quer"''; ''"Qualität"'').@@@@1@18@@danf@17-8-2009 10303670@unknown@formal@none@1@S@It is pronounced [kv].@@@@1@4@@danf@17-8-2009 10303680@unknown@formal@none@1@S@* '''r''' is pronounced as a [[Guttural R|guttural sound]] (an [[uvular trill]], [ʀ]) in front of a vowel or consonant (''"Rasen"'' [ʀaːzən]; ''"Burg"'' like [buʀg]).@@@@1@25@@danf@17-8-2009 10303690@unknown@formal@none@1@S@In spoken German however, it is commonly vocalised after a vowel (''"er"'' being pronounced rather like ['ɛɐ] - ''"Burg"'' [buɐg]).@@@@1@20@@danf@17-8-2009 10303700@unknown@formal@none@1@S@In some southern non-standard varieties, the '''r''' is pronounced as a tongue-tip r (the [[alveolar trill]]).@@@@1@16@@danf@17-8-2009 10303710@unknown@formal@none@1@S@* '''s''' in Germany, is pronounced [z] (as in "''Z''ebra") if it forms the [[syllable onset]] (e.g. Sohn [zoːn]), otherwise [s] (e.g. Bus [bʊs]).@@@@1@24@@danf@17-8-2009 10303720@unknown@formal@none@1@S@In Austria, always pronounced [s].@@@@1@5@@danf@17-8-2009 10303730@unknown@formal@none@1@S@A '''ss''' [s] indicates that the preceding vowel is short. '''st''' and '''sp''' at the beginning of words of German origin are pronounced [ʃt] and [ʃp], respectively.@@@@1@27@@danf@17-8-2009 10303740@unknown@formal@none@1@S@* '''ß''' (a letter unique to German called "Esszet") was a ligature of a double '''s''' ''and'' of a '''sz''' and is always pronounced [s].@@@@1@25@@danf@17-8-2009 10303750@unknown@formal@none@1@S@Originating in [[Blackletter]] typeface, it traditionally replaced '''ss''' at the end of a syllable (e.g. ''"ich muss"'' → ''"ich muß"''; ''"ich müsste"'' → ''"ich müßte"''); within a word it contrasts with '''ss''' [s] in indicating that the preceding vowel is long (compare ''"in Maßen"'' [in 'maːsən] "with moderation" and ''"in Massen"'' [in 'masən] "in loads").@@@@1@55@@danf@17-8-2009 10303760@unknown@formal@none@1@S@The use of '''ß''' has recently been limited by the latest German spelling reform and is no longer used for '''ss''' at the end of a syllable; Switzerland and Liechtenstein already abolished it in 1934.@@@@1@35@@danf@17-8-2009 10303770@unknown@formal@none@1@S@* '''sch''' is pronounced [ʃ] (like "sh" in "Shine").@@@@1@9@@danf@17-8-2009 10303780@unknown@formal@none@1@S@* '''v''' is pronounced [f] in words of Germanic origin (e.g. ''"Vater"'' [ˈfaːtɐ]) and [v] in most other words (e.g. ''"Vase"'' [ˈvaːzǝ]).@@@@1@22@@danf@17-8-2009 10303790@unknown@formal@none@1@S@* '''w''' is pronounced [v] like in "''v''acation" (e.g. ''"was"'' [vas]).@@@@1@11@@danf@17-8-2009 10303800@unknown@formal@none@1@S@* '''y''' only appears in loanwords and is traditionally considered a vowel.@@@@1@12@@danf@17-8-2009 10303810@unknown@formal@none@1@S@* '''z''' is always pronounced [ʦ] (e.g. ''"zog"'' [ʦoːk]).@@@@1@9@@danf@17-8-2009 10303820@unknown@formal@none@1@S@A '''tz''' indicates that the preceding vowel is short.@@@@1@9@@danf@17-8-2009 10303830@unknown@formal@none@1@S@====Consonant shifts====@@@@1@2@@danf@17-8-2009 10303840@unknown@formal@none@1@S@German does not have any [[dental fricative]]s (as English '''th''').@@@@1@10@@danf@17-8-2009 10303850@unknown@formal@none@1@S@The '''th''' sounds, which the English language has inherited from [[Anglo-Saxons|Anglo Saxon]], survived on the continent up to Old High German and then disappeared in German with the consonant shifts between the 8th and the 10th century.@@@@1@37@@danf@17-8-2009 10303860@unknown@formal@none@1@S@It is sometimes possible to find parallels between German by replacing the English '''th''' with '''d''' in German: "Thank" → in German "Dank", "this" and "that" → "dies" and "das", "[[thou]]" (old 2nd person singular pronoun) → "du", "think" → "denken", "thirsty" → "durstig" and many other examples.@@@@1@48@@danf@17-8-2009 10303870@unknown@formal@none@1@S@Likewise, the '''gh''' in [[Germanic languages|Germanic]] English words, pronounced in several different ways in modern English (as an '''f''', or not at all), can often be linked to German '''ch''': "to laugh" → "lachen", "through" and "thorough" → "durch", "high" → "hoch", "naught" → "nichts", etc.@@@@1@46@@danf@17-8-2009 10303880@unknown@formal@none@1@S@==Cognates with English==@@@@1@3@@danf@17-8-2009 10303890@unknown@formal@none@1@S@There are many thousands of German words that are [[cognate]] to English words (in fact a sizeable fraction of native German and English vocabulary, although for various reasons much of it is not immediately obvious).@@@@1@35@@danf@17-8-2009 10303900@unknown@formal@none@1@S@Most of the words in the following table have almost the same meaning as in English.@@@@1@16@@danf@17-8-2009 10303910@unknown@formal@none@1@S@Compound word cognates@@@@1@3@@danf@17-8-2009 10303920@unknown@formal@none@1@S@When these cognates have slightly different consonants, this is often due to the High German consonant shift.@@@@1@17@@danf@17-8-2009 10303930@unknown@formal@none@1@S@Hence the affinity of English words with those of German dialects is more evidently:@@@@1@14@@danf@17-8-2009 10303940@unknown@formal@none@1@S@There are cognates whose meanings in either language have changed through the centuries.@@@@1@13@@danf@17-8-2009 10303950@unknown@formal@none@1@S@It is sometimes difficult for both English and German speakers to discern the relationship.@@@@1@14@@danf@17-8-2009 10303960@unknown@formal@none@1@S@On the other hand, once the definitions are made clear, then the logical relation becomes obvious.@@@@1@16@@danf@17-8-2009 10303970@unknown@formal@none@1@S@Sometimes the generality or specificity of word pairs may be opposite in the two languages.@@@@1@15@@danf@17-8-2009 10303980@unknown@formal@none@1@S@German and English also share many borrowings from other languages, especially Latin, French and Greek.@@@@1@15@@danf@17-8-2009 10303990@unknown@formal@none@1@S@Most of these words have the same meaning, while a few have subtle differences in meaning.@@@@1@16@@danf@17-8-2009 10304000@unknown@formal@none@1@S@As many of these words have been borrowed by numerous languages, not only German and English, they are called ''[[internationalism (linguistics)|internationalisms]]'' in German linguistics.@@@@1@24@@danf@17-8-2009 10304010@unknown@formal@none@1@S@For reference, a good number of these borrowed words are of the neuter gender.@@@@1@14@@danf@17-8-2009 10304020@unknown@formal@none@1@S@==Words borrowed by English==@@@@1@4@@danf@17-8-2009 10304030@unknown@formal@none@1@S@:''For a list of German loanwords in English, see [[:Category:German loanwords]]''@@@@1@11@@danf@17-8-2009 10304040@unknown@formal@none@1@S@In the English language, there are also many words taken from German without any letter change, e.g.:@@@@1@17@@danf@17-8-2009 10304050@unknown@formal@none@1@S@==Names for German in other languages==@@@@1@6@@danf@17-8-2009 10304060@unknown@formal@none@1@S@:''See also: [[Deutsch]], [[Names for the Dutch language|Dutch]], [[Deitsch]], [[Dietsch]], [[Teuton]], [[Teutonic]], [[Allemanic]], [[Alleman]], [[Theodisca]]''@@@@1@15@@danf@17-8-2009 10304070@unknown@formal@none@1@S@The names that countries have for the language differ from region to region.@@@@1@13@@danf@17-8-2009 10304080@unknown@formal@none@1@S@In Italian the sole name for German is still ''tedesco'', from the Latin ''[[theodiscus]]'', meaning "vernacular".@@@@1@16@@danf@17-8-2009 10304090@unknown@formal@none@1@S@A possible explanation for the use of words meaning "mute" (e.g., ''nemoj'' in Russian, ''němý'' in Czech, ''nem'' in [[Serbian language|Serbian]]) to refer to German (and also to Germans) in Slavic languages is that Germans were the first people [[Slavic peoples|Slavic tribes]] encountered with whom they could not communicate.@@@@1@49@@danf@17-8-2009 10304100@unknown@formal@none@1@S@[[Romanian language|Romanian]] used to use the Slavonic term "nemţeşte", but "germană" is now widely used.@@@@1@15@@danf@17-8-2009 10304110@unknown@formal@none@1@S@Hungarian "német" is also of Slavonic origin.@@@@1@7@@danf@17-8-2009 10304120@unknown@formal@none@1@S@The [[Arabic language|Arabic]] name for Austria, النمسا ("an-namsa"), is derived from the Slavonic term.@@@@1@14@@danf@17-8-2009 10304130@unknown@formal@none@1@S@Note also that though the Russian term for the language is ''немецкий'' ''(nemetskij)'', the country is ''Германия'' ''(Germania)''.@@@@1@18@@danf@17-8-2009 10304140@unknown@formal@none@1@S@However, in certain other [[Slavic languages]], such as Czech, the country name (''Německo'') is similar to the name of the language, ''německý'' (jazyk).@@@@1@23@@danf@17-8-2009 10304150@unknown@formal@none@1@S@[[Finns]] and [[Estonians]] use the term ''saksa'', originally from the [[Saxon people|Saxon]] tribe.@@@@1@13@@danf@17-8-2009 10304160@unknown@formal@none@1@S@[[Scandinavians]] use derivatives of the word ''Tyskland/Þýskaland'' (from Theodisca) for the country and ''tysk(a)/þýska'' for the language.@@@@1@17@@danf@17-8-2009 10304170@unknown@formal@none@1@S@[[Hebrew language|Hebrew]] traditionally (nowadays this is not the case) used the Biblical term אַשְׁכֲּנָז ([[Ashkenaz]]) (Genesis 10:3) to refer to Germany, or to certain parts of it, and the [[Ashkenazi]] Jews are those who originate from Germany and [[Eastern Europe]] and formerly spoke Yiddish as their native language, derived from [[Middle High German]].@@@@1@53@@danf@17-8-2009 10304180@unknown@formal@none@1@S@Modern Hebrew uses גֶּרְמָנִי ''germaní'' (Or גֶּרְמָנִית ''germanít'' for the language).@@@@1@11@@danf@17-8-2009 10304190@unknown@formal@none@1@S@The French term is ''allemand'', the Spanish term is ''alemán'', the [[Catalan language|Catalan]] term is ''alemany'', and the [[Portuguese language|Portuguese]] term is ''alemão''; all derive from the ancient [[Alamanni]] tribal alliance, meaning literally "''All Men''".@@@@1@35@@danf@17-8-2009 10304200@unknown@formal@none@1@S@The [[Latvian language|Latvian]] term ''vācu'' means "tinny" and refers disparagingly to the iron-clad [[Teutonic Knights]] that colonized the Baltic in the Middle Ages.@@@@1@23@@danf@17-8-2009 10304210@unknown@formal@none@1@S@The [[Scottish Gaelic]] term for the German language, ''Gearmailtis'', is formed in the standard way of adding ''-(a)is'' to the end of the country name.@@@@1@25@@danf@17-8-2009 10304220@unknown@formal@none@1@S@See [[Names for Germany]] for further details on the origins of these and other terms.@@@@1@15@@danf@17-8-2009 10310010@unknown@formal@none@1@S@
GNU General Public License
@@@@1@4@@danf@17-8-2009 10310020@unknown@formal@none@1@S@The '''GNU General Public License''' ('''GNU GPL''' or simply '''GPL''') is a widely used [[free software license]], originally written by [[Richard Stallman]] for the [[GNU project]].@@@@1@26@@danf@17-8-2009 10310030@unknown@formal@none@1@S@The GPL is the most popular and well-known example of the type of strong [[copyleft]] license that requires derived works to be available under the same copyleft.@@@@1@27@@danf@17-8-2009 10310040@unknown@formal@none@1@S@Under this philosophy, the GPL is said to grant the recipients of a [[computer program]] the rights of the [[free software definition]] and uses copyleft to ensure the freedoms are preserved, even when the work is changed or added to.@@@@1@40@@danf@17-8-2009 10310050@unknown@formal@none@1@S@This is in distinction to [[permissive free software licenses]], of which the [[BSD licenses]] are the standard examples.@@@@1@18@@danf@17-8-2009 10310060@unknown@formal@none@1@S@The [[GNU Lesser General Public License]] (LGPL) is a modified, more permissive, version of the GPL, originally intended for some [[library (computing)|software libraries]].@@@@1@23@@danf@17-8-2009 10310070@unknown@formal@none@1@S@There is also a [[GNU Free Documentation License]], which was originally intended for use with documentation for GNU software, but has also been adopted for other uses, such as the [[Wikipedia]] project.@@@@1@32@@danf@17-8-2009 10310080@unknown@formal@none@1@S@The [[Affero General Public License]] (GNU AGPL) is a similar license with a focus on networking server software.@@@@1@18@@danf@17-8-2009 10310090@unknown@formal@none@1@S@The GNU AGPL is similar to the GNU General Public License, except that it additionally covers the use of the software over a computer network, requiring that the complete source code be made available to any network user of the AGPLed work, for example a web application.@@@@1@47@@danf@17-8-2009 10310100@unknown@formal@none@1@S@The Free Software Foundation recommends that this license is considered for any software that will commonly be run over the network.@@@@1@21@@danf@17-8-2009 10310110@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10310120@unknown@formal@none@1@S@The GPL was written by [[Richard Stallman]] in 1989 for use with programs released as part of the [[GNU project]].@@@@1@20@@danf@17-8-2009 10310130@unknown@formal@none@1@S@The original GPL was based on a unification of similar licenses used for early versions of [[GNU Emacs]], the [[GNU Debugger]] and the [[GNU Compiler Collection]].@@@@1@26@@danf@17-8-2009 10310140@unknown@formal@none@1@S@These licenses contained similar provisions to the modern GPL, but were specific to each program, rendering them incompatible, despite being the same license.@@@@1@23@@danf@17-8-2009 10310150@unknown@formal@none@1@S@Stallman's goal was to produce one license that could be used for any project, thus making it possible for many projects to share code.@@@@1@24@@danf@17-8-2009 10310160@unknown@formal@none@1@S@An important vote of confidence in the GPL came from [[Linus Torvalds]]' adoption of the license for the [[History of the Linux kernel|Linux kernel]] in 1992, switching from an earlier license that prohibited commercial distribution.@@@@1@35@@danf@17-8-2009 10310170@unknown@formal@none@1@S@As of August 2007, the GPL accounted for nearly 65% of the 43,442 free software projects listed on [[Freshmeat]], and [[As of 2006|as of January 2006]], about 68% of the projects listed on [[SourceForge.net]].@@@@1@34@@danf@17-8-2009 10310180@unknown@formal@none@1@S@Similarly, a 2001 survey of [[Red Hat Linux]] 7.1 found that 50% of the source code was licensed under the GPL and a 1997 survey of [[Ibiblio|MetaLab]], then the largest free software archive, showed that the GPL accounted for about half of the licenses used.@@@@1@45@@danf@17-8-2009 10310190@unknown@formal@none@1@S@One survey of a large repository of open-source software reported that in July 1997, about half the software packages with explicit license terms used the GPL.@@@@1@26@@danf@17-8-2009 10310200@unknown@formal@none@1@S@Prominent free software programs licensed under the GPL include the [[Linux kernel]] and the [[GNU Compiler Collection]] (GCC).@@@@1@18@@danf@17-8-2009 10310210@unknown@formal@none@1@S@Some other free software programs are [[dual-licensed]] under multiple licenses, often with one of the licenses being the GPL.@@@@1@19@@danf@17-8-2009 10310220@unknown@formal@none@1@S@Some observers believe that the strong [[copyleft]] provided by the GPL was crucial to the success of Linux, giving the programmers who contributed to it the confidence that their work would benefit the whole world and remain free, rather than being exploited by software companies that would not have to give anything back to the community.@@@@1@56@@danf@17-8-2009 10310230@unknown@formal@none@1@S@The second version of the license, version 2, was released in 1991.@@@@1@12@@danf@17-8-2009 10310240@unknown@formal@none@1@S@Over the following 15 years, some members of the [[free software community|FOSS (Free and Open Source Software) community]] came to believe that some software and hardware vendors were finding loopholes in the GPL, allowing GPL-licensed software to be exploited in ways that were contrary to the intentions of the programmers.@@@@1@50@@danf@17-8-2009 10310250@unknown@formal@none@1@S@These concerns included [[tivoization]] (the inclusion of GPL-licensed software in hardware that will refuse to run modified versions of its software); the use of unpublished, modified versions of GPL software behind web interfaces; and patent deals between [[Microsoft]] and Linux and Unix distributors that may represent an attempt to use patents as a weapon against competition from Linux.@@@@1@58@@danf@17-8-2009 10310260@unknown@formal@none@1@S@Version 3 was developed to attempt to address these concerns.@@@@1@10@@danf@17-8-2009 10310270@unknown@formal@none@1@S@It was [http://www.fsf.org/news/gplv3_launched officially released] on [[June 29]], [[2007]].@@@@1@9@@danf@17-8-2009 10310280@unknown@formal@none@1@S@==Versions==@@@@1@1@@danf@17-8-2009 10310290@unknown@formal@none@1@S@===Version 1===@@@@1@2@@danf@17-8-2009 10310300@unknown@formal@none@1@S@Version 1 of the GNU GPL, released in January 1989, prevented what were then the two main ways that software distributors restricted the freedoms that define free software.@@@@1@28@@danf@17-8-2009 10310310@unknown@formal@none@1@S@The first problem was that distributors may publish [[binary file]]s only – executable, but not readable or modifiable by humans.@@@@1@20@@danf@17-8-2009 10310320@unknown@formal@none@1@S@To prevent this, GPLv1 said that any vendor distributing binaries must also make the human readable source code available under the same licensing terms.@@@@1@24@@danf@17-8-2009 10310330@unknown@formal@none@1@S@The second problem was the distributors might add additional restrictions, either by adding restrictions to the license, or by combining the software with other software which had other restrictions on its distribution.@@@@1@32@@danf@17-8-2009 10310340@unknown@formal@none@1@S@If this was done, then the union of the two sets of restrictions would apply to the combined work, thus unacceptable restrictions could be added.@@@@1@25@@danf@17-8-2009 10310350@unknown@formal@none@1@S@To prevent this, GPLv1 said that modified versions, as a whole, had to be distributed under the terms in GPLv1.@@@@1@20@@danf@17-8-2009 10310360@unknown@formal@none@1@S@Therefore, software distributed under the terms of GPLv1 could be combined with software under more permissive terms, as this would not change the terms under which the whole could be distributed, but software distributed under GPLv1 could not be combined with software distributed under a more restrictive license, as this would conflict with the requirement that the whole be distributable under the terms of GPLv1.@@@@1@65@@danf@17-8-2009 10310370@unknown@formal@none@1@S@===Version 2===@@@@1@2@@danf@17-8-2009 10310380@unknown@formal@none@1@S@According to Richard Stallman, the major change in GPLv2 was the "Liberty or Death" clause, as he calls it - Section 7.@@@@1@22@@danf@17-8-2009 10310390@unknown@formal@none@1@S@This section says that if someone has restrictions imposed that ''prevent'' him or her from distributing GPL-covered software in a way that respects other users' freedom (for example, if a legal ruling states that he or she can only distribute the software in binary form), he or she cannot distribute it at all.@@@@1@53@@danf@17-8-2009 10310400@unknown@formal@none@1@S@By 1990, it was becoming apparent that a less restrictive license would be strategically useful for some software libraries; when version 2 of the GPL (GPLv2) was released in June 1991, therefore, a second license - the Library General Public License (LGPL) was introduced at the same time and numbered with version 2 to show that both were complementary.@@@@1@59@@danf@17-8-2009 10310410@unknown@formal@none@1@S@The version numbers diverged in 1999 when version 2.1 of the LGPL was released, which renamed it the [[GNU Lesser General Public License]] to reflect its place in the GNU philosophy.@@@@1@31@@danf@17-8-2009 10310420@unknown@formal@none@1@S@===Version 3===@@@@1@2@@danf@17-8-2009 10310430@unknown@formal@none@1@S@In late 2005, the [[Free Software Foundation]] (FSF) announced work on version 3 of the GPL (GPLv3).@@@@1@17@@danf@17-8-2009 10310440@unknown@formal@none@1@S@On [[January 16]], [[2006]], the first "discussion draft" of GPLv3 was published, and the public consultation began.@@@@1@17@@danf@17-8-2009 10310450@unknown@formal@none@1@S@The public consultation was originally planned for nine to fifteen months but finally stretched to eighteen months with four drafts being published.@@@@1@22@@danf@17-8-2009 10310460@unknown@formal@none@1@S@The official GPLv3 was released by FSF on [[June 29]], [[2007]].@@@@1@11@@danf@17-8-2009 10310470@unknown@formal@none@1@S@GPLv3 was written by [[Richard Stallman]], with legal counsel from [[Eben Moglen]] and [[Software Freedom Law Center]].@@@@1@17@@danf@17-8-2009 10310480@unknown@formal@none@1@S@According to Stallman, the most important changes are in relation to [[Software patents and free software|software patents]], [[free software license]] compatibility, the definition of "source code", and hardware restrictions on software modification ("[[tivoization]]").@@@@1@33@@danf@17-8-2009 10310490@unknown@formal@none@1@S@Other changes relate to internationalisation, how license violations are handled, and how additional permissions can be granted by the copyright holder.@@@@1@21@@danf@17-8-2009 10310500@unknown@formal@none@1@S@Other notable changes include allowing authors to add certain additional conditions or requirements to their contributions.@@@@1@16@@danf@17-8-2009 10310510@unknown@formal@none@1@S@One of those new optional requirements, sometimes referred to as the Affero clause, is intended to fulfill a request regarding [[software as a service]]; the permitting addition of this requirement makes GPLv3 compatible with the [[Affero General Public License]].@@@@1@39@@danf@17-8-2009 10310520@unknown@formal@none@1@S@The public consultation process was coordinated by the Free Software Foundation with assistance from [[Software Freedom Law Center]], [[Free Software Foundation Europe]], and other free software groups.@@@@1@27@@danf@17-8-2009 10310530@unknown@formal@none@1@S@Comments were collected from the public via the gplv3.fsf.org web portal.@@@@1@11@@danf@17-8-2009 10310540@unknown@formal@none@1@S@That portal runs purpose-written software called [[stet (software)|stet]].@@@@1@8@@danf@17-8-2009 10310550@unknown@formal@none@1@S@These comments were passed to four committees comprising approximately 130 people, including supporters and detractors of FSF's goals.@@@@1@18@@danf@17-8-2009 10310560@unknown@formal@none@1@S@Those committees researched the comments submitted by the public and passed their summaries to Stallman for a decision on what the license would do.@@@@1@24@@danf@17-8-2009 10310570@unknown@formal@none@1@S@During the public consultation process, 962 comments were submitted for the first draft.@@@@1@13@@danf@17-8-2009 10310580@unknown@formal@none@1@S@By the end, a total of 2,636 comments had been submitted.@@@@1@11@@danf@17-8-2009 10310590@unknown@formal@none@1@S@The third draft was released on [[March 28]], [[2007]].@@@@1@9@@danf@17-8-2009 10310600@unknown@formal@none@1@S@This draft included language intended to prevent patent cross-licenses like the controversial [[Novell#Agreement with Microsoft|Microsoft-Novell patent agreement]] and restricts the anti-tivoization clauses to a legal definition of a "User" or "consumer product."@@@@1@32@@danf@17-8-2009 10310610@unknown@formal@none@1@S@It also explicitly removed the section on "Geographical Limitations", whose probable removal had been announced at the launch of the public consultation.@@@@1@22@@danf@17-8-2009 10310620@unknown@formal@none@1@S@The fourth discussion draft, which was the last, was released on [[May 31]], [[2007]].@@@@1@14@@danf@17-8-2009 10310630@unknown@formal@none@1@S@It introduced [[Apache Software License]] compatibility, clarified the role of outside contractors, and made an exception to permit the Microsoft-Novell agreement, saying in section 11 paragraph 6 that@@@@1@28@@danf@17-8-2009 10310640@unknown@formal@none@1@S@This aims to make future such deals ineffective.@@@@1@8@@danf@17-8-2009 10310650@unknown@formal@none@1@S@The license is also meant to cause Microsoft to extend the patent licenses it grants to Novell customers for the use of GPLv3 software to ''all'' users of that GPLv3 software; this is possible only if Microsoft is legally a "conveyor" of the GPLv3 software.@@@@1@45@@danf@17-8-2009 10310660@unknown@formal@none@1@S@Others, notably some high-profile developers of the [[Linux kernel]], commented to the mass media and made public statements about their objections to parts of discussion drafts 1 and 2.@@@@1@29@@danf@17-8-2009 10310670@unknown@formal@none@1@S@== Terms and conditions ==@@@@1@5@@danf@17-8-2009 10310680@unknown@formal@none@1@S@The terms and conditions of the GPL are available to anybody receiving a copy of the work that has a GPL applied to it ("the licensee").@@@@1@26@@danf@17-8-2009 10310690@unknown@formal@none@1@S@Any licensee who adheres to the terms and conditions is given permission to modify the work, as well as to copy and redistribute the work or any derivative version.@@@@1@29@@danf@17-8-2009 10310700@unknown@formal@none@1@S@The licensee is allowed to charge a fee for this service, or do this free of charge.@@@@1@17@@danf@17-8-2009 10310710@unknown@formal@none@1@S@This latter point distinguishes the GPL from software licenses that prohibit commercial redistribution.@@@@1@13@@danf@17-8-2009 10310720@unknown@formal@none@1@S@The FSF argues that free software should not place restrictions on commercial use, and the GPL explicitly states that GPL works may be sold at any price.@@@@1@27@@danf@17-8-2009 10310730@unknown@formal@none@1@S@The GPL additionally states that a distributor may not impose "further restrictions on the rights granted by the GPL".@@@@1@19@@danf@17-8-2009 10310740@unknown@formal@none@1@S@This forbids activities such as distributing of the software under a non-disclosure agreement or contract.@@@@1@15@@danf@17-8-2009 10310750@unknown@formal@none@1@S@Distributors under the GPL also grant a license for any of their patents practiced by the software, to practice those patents in GPL software.@@@@1@24@@danf@17-8-2009 10310760@unknown@formal@none@1@S@Section three of the license requires that programs distributed as pre-compiled binaries are accompanied by a copy of the source code, a written offer to distribute the source code via the same mechanism as the pre-compiled binary or the written offer to obtain the source code that you got when you received the pre-compiled binary under the GPL.@@@@1@58@@danf@17-8-2009 10310770@unknown@formal@none@1@S@=== Copyleft ===@@@@1@3@@danf@17-8-2009 10310780@unknown@formal@none@1@S@The distribution rights granted by the GPL for modified versions of the work are not unconditional.@@@@1@16@@danf@17-8-2009 10310790@unknown@formal@none@1@S@When someone distributes a GPL'd work plus their own modifications, the requirements for distributing the whole work cannot be any greater than the requirements that are in the GPL.@@@@1@29@@danf@17-8-2009 10310800@unknown@formal@none@1@S@This requirement is known as copyleft.@@@@1@6@@danf@17-8-2009 10310810@unknown@formal@none@1@S@It earns its legal power from the use of [[copyright]] on software programs.@@@@1@13@@danf@17-8-2009 10310820@unknown@formal@none@1@S@Because a GPL work is copyrighted, a licensee has no right to redistribute it, not even in modified form (barring [[fair use]]), except under the terms of the license.@@@@1@29@@danf@17-8-2009 10310830@unknown@formal@none@1@S@One is only required to adhere to the terms of the GPL if one wishes to exercise rights normally restricted by copyright law, such as redistribution.@@@@1@26@@danf@17-8-2009 10310840@unknown@formal@none@1@S@Conversely, if one distributes copies of the work without abiding by the terms of the GPL (for instance, by keeping the source code secret), he or she can be [[lawsuit|sued]] by the original author under copyright law.@@@@1@37@@danf@17-8-2009 10310850@unknown@formal@none@1@S@Copyleft thus uses copyright law to accomplish the opposite of its usual purpose: instead of imposing restrictions, it grants rights to other people, in a way that ensures the rights cannot subsequently be taken away.@@@@1@35@@danf@17-8-2009 10310860@unknown@formal@none@1@S@It also ensures that unlimited redistribution rights are not granted, should any legal flaw (or "[[computer bug|bug]]") be found in the copyleft statement.@@@@1@23@@danf@17-8-2009 10310870@unknown@formal@none@1@S@Many distributors of GPL'ed programs bundle the source code with the [[executable]]s.@@@@1@12@@danf@17-8-2009 10310880@unknown@formal@none@1@S@An alternative method of satisfying the copyleft is to provide a written offer to provide the source code on a physical medium (such as a CD) upon request.@@@@1@28@@danf@17-8-2009 10310890@unknown@formal@none@1@S@In practice, many GPL'ed programs are distributed over the [[Internet]], and the source code is made available over [[File Transfer Protocol|FTP]].@@@@1@21@@danf@17-8-2009 10310900@unknown@formal@none@1@S@For Internet distribution, this complies with the license.@@@@1@8@@danf@17-8-2009 10310910@unknown@formal@none@1@S@Copyleft applies only when a person seeks to redistribute the program.@@@@1@11@@danf@17-8-2009 10310920@unknown@formal@none@1@S@One is allowed to make private modified versions, without any obligation to divulge the modifications as long as the modified software is not distributed to anyone else.@@@@1@27@@danf@17-8-2009 10310930@unknown@formal@none@1@S@Note that the copyleft applies only to the software and not to its output (unless that output is itself a derivative work of the program); for example, a public web portal running a modified derivative of a GPL'ed [[content management system]] is not required to distribute its changes to the underlying software.@@@@1@52@@danf@17-8-2009 10310940@unknown@formal@none@1@S@==Licensing and contractual issues==@@@@1@4@@danf@17-8-2009 10310950@unknown@formal@none@1@S@The GPL was designed as a [[license]], rather than a [[contract]].@@@@1@11@@danf@17-8-2009 10310960@unknown@formal@none@1@S@In some [[Common Law]] jurisdictions, the legal distinction between a license and a contract is an important one: contracts are enforceable by [[contract law]], whereas licenses are enforced under [[copyright law]].@@@@1@31@@danf@17-8-2009 10310970@unknown@formal@none@1@S@However, this distinction is not useful in the many jurisdictions where there are no differences between contracts and licenses, such as [[Civil law (legal system)|Civil Law]] systems.@@@@1@27@@danf@17-8-2009 10310980@unknown@formal@none@1@S@Those who do not agree to the GPL's terms and conditions do not have permission, under copyright law, to copy or distribute GPL licensed software or derivative works.@@@@1@28@@danf@17-8-2009 10310990@unknown@formal@none@1@S@However, they may still use the software however they like.@@@@1@10@@danf@17-8-2009 10311000@unknown@formal@none@1@S@== Copyright holders ==@@@@1@4@@danf@17-8-2009 10311010@unknown@formal@none@1@S@The text of the GPL is itself copyrighted, and the copyright is held by the [[Free Software Foundation]] (FSF).@@@@1@19@@danf@17-8-2009 10311020@unknown@formal@none@1@S@However, the FSF does not hold the copyright for a work released under the GPL, unless an author explicitly assigns copyrights to the FSF (which seldom happens except for programs that are part of the [[GNU]] project).@@@@1@37@@danf@17-8-2009 10311030@unknown@formal@none@1@S@Only the individual copyright holders have the authority to sue when a license violation takes place.@@@@1@16@@danf@17-8-2009 10311040@unknown@formal@none@1@S@The FSF permits people to create new licenses based on the GPL, as long as the derived licenses do not use the GPL preamble without permission.@@@@1@26@@danf@17-8-2009 10311050@unknown@formal@none@1@S@This is discouraged, however, since such a license is generally incompatible with the GPL.@@@@1@14@@danf@17-8-2009 10311060@unknown@formal@none@1@S@(See the [http://www.fsf.org/licenses/gpl-faq.html#ModifyGPL GPL FAQ] for more information.)@@@@1@8@@danf@17-8-2009 10311070@unknown@formal@none@1@S@Other licenses created by the GNU project include the [[GNU Lesser General Public License]] and the [[GNU Free Documentation License]].@@@@1@20@@danf@17-8-2009 10311080@unknown@formal@none@1@S@== The GPL in court ==@@@@1@6@@danf@17-8-2009 10311090@unknown@formal@none@1@S@A key dispute related to the GPL is whether or not non-GPL software can [[library linking|dynamically link]] to GPL libraries.@@@@1@20@@danf@17-8-2009 10311100@unknown@formal@none@1@S@The GPL is clear in requiring that all [[derivative work]]s of GPL'ed code must themselves be GPL'ed.@@@@1@17@@danf@17-8-2009 10311110@unknown@formal@none@1@S@However, it is not clear whether an executable that dynamically links to a GPL code should be considered a derivative work.@@@@1@21@@danf@17-8-2009 10311120@unknown@formal@none@1@S@The free/open-source software community is split on this issue.@@@@1@9@@danf@17-8-2009 10311130@unknown@formal@none@1@S@The FSF asserts that such an executable is indeed a derivative work if the executable and GPL code "make function calls to each other and share data structures," with others agreeing, while some (e.g. [[Linus Torvalds]]) agree that dynamic linking can create derived works but disagree over the circumstances.@@@@1@49@@danf@17-8-2009 10311150@unknown@formal@none@1@S@On the other hand, some experts have argued that the question is still open: one [[Novell]] lawyer has written that dynamic linking not being derivative "makes sense" but is not "clear-cut," and [[Lawrence Rosen]] has claimed that a court of law would "probably" exclude dynamic linking from derivative works although "there are also good arguments" on the other side and "the outcome is not clear" (on a later occasion, he argued that "market-based" factors are more important than the linking technique).@@@@1@81@@danf@17-8-2009 10311160@unknown@formal@none@1@S@This is ultimately a question not of the GPL ''per se'', but of how copyright law defines derivative works.@@@@1@19@@danf@17-8-2009 10311170@unknown@formal@none@1@S@In ''[[Galoob v. Nintendo]]'' the [[Ninth Circuit Court of Appeals]] defined a derivative work as having "'form' or permanence" and noted that "the infringing work must incorporate a portion of the copyrighted work in some form," but there have been no clear court decisions to resolve this particular conflict.@@@@1@49@@danf@17-8-2009 10311180@unknown@formal@none@1@S@Since there is no record of anyone circumventing the GPL by dynamic linking and contesting when threatened with lawsuits by the copyright holder, the restriction appears ''[[de facto]]'' enforceable even if not yet proven ''[[de jure]]''.@@@@1@36@@danf@17-8-2009 10311190@unknown@formal@none@1@S@In 2002, MySQL AB sued Progress NuSphere for copyright and trademark infringement in [[U.S. District Court for the District of Massachusetts|United States district court]].@@@@1@24@@danf@17-8-2009 10311200@unknown@formal@none@1@S@NuSphere had allegedly violated MySQL's copyright by linking code for the Gemini table type into the MySQL server.@@@@1@18@@danf@17-8-2009 10311210@unknown@formal@none@1@S@After a preliminary hearing before Judge [[Patti Saris]] on [[February 27]], [[2002]], the parties entered settlement talks and eventually settled.@@@@1@20@@danf@17-8-2009 10311220@unknown@formal@none@1@S@At the hearing, Judge Saris "saw no reason" that the GPL would not be enforceable.@@@@1@15@@danf@17-8-2009 10311230@unknown@formal@none@1@S@In August 2003, the [[SCO Group]] stated that they believed the GPL to have no legal validity, and that they intended to take up lawsuits over sections of code supposedly copied from SCO Unix into the [[Linux kernel]].@@@@1@38@@danf@17-8-2009 10311240@unknown@formal@none@1@S@This was a problematic stand for them, as they had distributed Linux and other GPL'ed code in their [[Caldera OpenLinux]] distribution, and there is little evidence that they had any legal right to do so except under the terms of the GPL.@@@@1@42@@danf@17-8-2009 10311250@unknown@formal@none@1@S@For more information, see [[SCO-Linux controversies]] and [[SCO v. IBM]].@@@@1@10@@danf@17-8-2009 10311260@unknown@formal@none@1@S@In April 2004 the [[netfilter/iptables]] project was granted a preliminary [[injunction]] against Sitecom Germany by [[Munich]] District Court after Sitecom refused to desist from distributing Netfilter's GPL'ed software in violation of the terms of the GPL.@@@@1@36@@danf@17-8-2009 10311270@unknown@formal@none@1@S@On July 2004 , the German court confirmed this injunction as a final ruling against Sitecom.@@@@1@16@@danf@17-8-2009 10311280@unknown@formal@none@1@S@The court's justification for its decision exactly mirrored the predictions given earlier by the FSF's [[Eben Moglen]]:@@@@1@17@@danf@17-8-2009 10311290@unknown@formal@none@1@S@: ''Defendant has infringed on the copyright of plaintiff by offering the software 'netfilter/iptables' for download and by advertising its distribution, without adhering to the license conditions of the GPL.@@@@1@30@@danf@17-8-2009 10311300@unknown@formal@none@1@S@Said actions would only be permissible if defendant had a license grant...@@@@1@12@@danf@17-8-2009 10311310@unknown@formal@none@1@S@This is independent of the questions whether the licensing conditions of the GPL have been effectively agreed upon between plaintiff and defendant or not.@@@@1@24@@danf@17-8-2009 10311320@unknown@formal@none@1@S@If the GPL were not agreed upon by the parties, defendant would notwithstanding lack the necessary rights to copy, distribute, and make the software 'netfilter/iptables' publicly available.''@@@@1@27@@danf@17-8-2009 10311330@unknown@formal@none@1@S@This ruling was important because it was the first time that a court had confirmed that violating terms of the GPL was an act of copyright violation.@@@@1@27@@danf@17-8-2009 10311340@unknown@formal@none@1@S@However, the case was not as crucial a test for the GPL as some have concluded.@@@@1@16@@danf@17-8-2009 10311350@unknown@formal@none@1@S@In the case, the enforceability of GPL itself was not under attack.@@@@1@12@@danf@17-8-2009 10311360@unknown@formal@none@1@S@Instead, the court was merely attempting to discern if the license itself was in effect.@@@@1@15@@danf@17-8-2009 10311370@unknown@formal@none@1@S@In May of [[2005]], [[Wallace versus International Business Machines et al|Daniel Wallace]] filed suit against the [[Free Software Foundation]] (FSF) in the [[U.S. District Court for the Southern District of Indiana|Southern District of Indiana]], contending that the GPL is an illegal attempt to fix prices at zero.@@@@1@47@@danf@17-8-2009 10311380@unknown@formal@none@1@S@The suit was dismissed in March 2006, on the grounds that Wallace had failed to state a valid anti-trust claim; the court noted that "the GPL encourages, rather than discourages, free competition and the distribution of computer operating systems, the benefits of which directly pass to consumers."@@@@1@47@@danf@17-8-2009 10311390@unknown@formal@none@1@S@Wallace was denied the possibility of further amending his complaint, and was ordered to pay the FSF's legal expenses.@@@@1@19@@danf@17-8-2009 10311400@unknown@formal@none@1@S@On September 8, 2005, Seoul Central District Court ruled that GPL has no legal relevance concerning the case dealing with [[trade secret]] derived from GPL-licensed work.@@@@1@26@@danf@17-8-2009 10311410@unknown@formal@none@1@S@Defendants argued that since it is impossible to maintain trade secret while being compliant with GPL and distributing the work, they aren't in breach of trade secret.@@@@1@27@@danf@17-8-2009 10311420@unknown@formal@none@1@S@This argument was considered without ground.@@@@1@6@@danf@17-8-2009 10311430@unknown@formal@none@1@S@On September 6, 2006, the [[gpl-violations.org]] project prevailed in court litigation against D-Link Germany GmbH regarding D-Link's inappropriate and copyright infringing use of parts of the Linux Operating System Kernel.@@@@1@30@@danf@17-8-2009 10311440@unknown@formal@none@1@S@The judgment finally provided the on-record, legal precedent that the GPL is valid and legally binding, and that it will stand up in German court.@@@@1@25@@danf@17-8-2009 10311450@unknown@formal@none@1@S@In late 2007, the developers of [[BusyBox]] and the [[Software Freedom Law Center]] embarked upon a program to gain GPL compliance from distributors of BusyBox in [[embedded system]]s, suing those who would not comply.@@@@1@34@@danf@17-8-2009 10311460@unknown@formal@none@1@S@These were claimed to be the first US uses of courts for enforcement of GPL obligations.@@@@1@16@@danf@17-8-2009 10311470@unknown@formal@none@1@S@''See'' [[BusyBox#GPL lawsuits]].@@@@1@3@@danf@17-8-2009 10311480@unknown@formal@none@1@S@== Compatibility and multi-licensing==@@@@1@4@@danf@17-8-2009 10311490@unknown@formal@none@1@S@Many of the most common free software licenses, such as the original [[MIT License|MIT/X license]], the [[BSD license]] (in its current 3-clause form), and the [[GNU Lesser General Public License|LGPL]], are "GPL-[[License compatibility|compatible]]".@@@@1@33@@danf@17-8-2009 10311500@unknown@formal@none@1@S@That is, their code can be combined with a program under the GPL without conflict (the new combination would have the GPL applied to the whole).@@@@1@26@@danf@17-8-2009 10311510@unknown@formal@none@1@S@However, some free/open source software licenses are not GPL-compatible.@@@@1@9@@danf@17-8-2009 10311520@unknown@formal@none@1@S@Many GPL proponents have strongly advocated that free/open source software developers use only GPL-compatible licenses, because doing otherwise makes it difficult to reuse software in larger wholes.@@@@1@27@@danf@17-8-2009 10311530@unknown@formal@none@1@S@Note that this issue only arises in concurrent use of licenses which impose conditions on their manner of combination.@@@@1@19@@danf@17-8-2009 10311540@unknown@formal@none@1@S@Some licenses, such as the BSD license, impose no conditions on the manner of their combination.@@@@1@16@@danf@17-8-2009 10311550@unknown@formal@none@1@S@Also see the [[list of FSF approved software licenses]] for examples of compatible and incompatible licenses.@@@@1@16@@danf@17-8-2009 10311560@unknown@formal@none@1@S@A number of businesses use [[dual-licensing]] to distribute a GPL version and sell a [[proprietary software|proprietary]] license to companies wishing to combine the package with proprietary code, using dynamic linking or not.@@@@1@32@@danf@17-8-2009 10311570@unknown@formal@none@1@S@Examples of such companies include [[MySQL AB]], [[Trolltech]] ([[Qt (toolkit)|Qt toolkit]]), [[Namesys]] ([[ReiserFS]]) and [[Red Hat]] ([[Cygwin]]).@@@@1@17@@danf@17-8-2009 10311580@unknown@formal@none@1@S@== Adoption ==@@@@1@3@@danf@17-8-2009 10311590@unknown@formal@none@1@S@The Open Source License Resource Center maintained by [[Black Duck Software]] shows that GPL is the license used in about 70% of all open source software.@@@@1@26@@danf@17-8-2009 10311600@unknown@formal@none@1@S@The vast majority of projects are released under GPL 2 with 3000 open source projects having migrated to GPL 3.@@@@1@20@@danf@17-8-2009 10311610@unknown@formal@none@1@S@==Criticism==@@@@1@1@@danf@17-8-2009 10311620@unknown@formal@none@1@S@In [[2001]] [[Microsoft]] [[CEO]] [[Steve Ballmer]] referred to Linux as "a cancer that attaches itself in an intellectual property sense to everything it touches."@@@@1@24@@danf@17-8-2009 10311630@unknown@formal@none@1@S@Critics of Microsoft claim that the real reason Microsoft dislikes the GPL is that the GPL resists proprietary vendors' attempts to "[[embrace, extend and extinguish]]".@@@@1@25@@danf@17-8-2009 10311640@unknown@formal@none@1@S@Microsoft has released [[Microsoft Windows Services for UNIX]] which contains GPL-licensed code.@@@@1@12@@danf@17-8-2009 10311650@unknown@formal@none@1@S@In response to Microsoft's attacks on the GPL, several prominent Free Software developers and advocates released a joint statement supporting the license.@@@@1@22@@danf@17-8-2009 10311660@unknown@formal@none@1@S@The GPL has been described as being [[Copyleft#Is copyleft .22viral.22.3F|"viral"]] by many of its critics because the GPL only allows conveyance of whole programs, which means that programmers are not allowed to convey programs that [[GPL linking exception|link]] to libraries having GPL-incompatible licenses.@@@@1@43@@danf@17-8-2009 10311670@unknown@formal@none@1@S@The so-called "viral" effect of this is that under such circumstances disparately licensed software cannot be combined unless one of the licenses is changed.@@@@1@24@@danf@17-8-2009 10311680@unknown@formal@none@1@S@Although theoretically either license could be changed, in the "viral" scenario the GPL cannot be practically changed (because the software may have so many contributors, some of whom will likely refuse), whereas the license of the other software ''can'' be practically changed.@@@@1@42@@danf@17-8-2009 10311690@unknown@formal@none@1@S@This is part of a [[BSD and GPL licensing|philosophical difference]] between the GPL and permissive free software licenses such as the [[BSD licenses|BSD-style licenses]], which do not put such a requirement on modified versions.@@@@1@34@@danf@17-8-2009 10311700@unknown@formal@none@1@S@While proponents of the GPL believe that free software should ensure that its freedoms are preserved all the way from the developer to the user, others believe that intermediaries between the developer and the user should be free to redistribute the software as non-free software.@@@@1@45@@danf@17-8-2009 10311710@unknown@formal@none@1@S@More specifically, the GPL requires that redistribution occur subject to the GPL, whereas more "permissive" licenses allow redistribution to occur under licenses more restrictive than the original license.@@@@1@28@@danf@17-8-2009 10311720@unknown@formal@none@1@S@While the GPL does allow commercial distribution of GPL software, the market price will settle near the price of distribution—near zero—since the purchasers may redistribute the software and its source code for their cost of redistribution.@@@@1@36@@danf@17-8-2009 10311730@unknown@formal@none@1@S@This could be seen to inhibit commercial use of GPL'ed code by others wishing to use that code for proprietary purposes—if they don't wish to avail themselves of GPL'ed code, they will have to re-implement it themselves.@@@@1@37@@danf@17-8-2009 10311740@unknown@formal@none@1@S@Microsoft has included anti-GPL terms in their open source software.@@@@1@10@@danf@17-8-2009 10311750@unknown@formal@none@1@S@In addition, the [[FreeBSD]] project has stated that "a less publicized and unintended use of the GPL is that it is very favorable to large companies that want to undercut software companies.@@@@1@32@@danf@17-8-2009 10311760@unknown@formal@none@1@S@In other words, the GPL is well suited for use as a marketing weapon, potentially reducing overall economic benefit and contributing to monopolistic behavior".@@@@1@24@@danf@17-8-2009 10311770@unknown@formal@none@1@S@It's not clear that there are any cases of this happening in practice, however.@@@@1@14@@danf@17-8-2009 10311780@unknown@formal@none@1@S@The GPL has no [[Indemnity|indemnification]] clause explicitly protecting maintainers and developers from litigation resulting from unscrupulous contribution.@@@@1@17@@danf@17-8-2009 10311790@unknown@formal@none@1@S@(If a developer submits existing patented or copyright work to a GPL project claiming it as their own contribution, all the project maintainers and even other developers can be held legally responsible for damages to the copyright or patent holder.)@@@@1@40@@danf@17-8-2009 10311800@unknown@formal@none@1@S@Lack of indemnification is one criticism that lead Mozilla to create the [[Mozilla Public License]] rather than use the GPL or LGPL.@@@@1@22@@danf@17-8-2009 10311810@unknown@formal@none@1@S@However, Mozilla later relicensed their work under a GPL/LGPL/MPL triple license, due to problems with the GPL-incompatibility of the MPL.@@@@1@20@@danf@17-8-2009 10311820@unknown@formal@none@1@S@Some software developers have found the extensive scope of the GPL to be too restrictive.@@@@1@15@@danf@17-8-2009 10311830@unknown@formal@none@1@S@For example, Bjørn Reese and Daniel Stenberg describe how the downstream effects of the GPL on later developers creates a "quodque pro quo" (Latin, "Everything in return for something").@@@@1@29@@danf@17-8-2009 10311840@unknown@formal@none@1@S@For that reason, in 2001 they abandoned the GPLv2 in favor of less restrictive copyleft licenses.@@@@1@16@@danf@17-8-2009 10311850@unknown@formal@none@1@S@A more specific example of the downstream effects of the GPL can be observed through the frame of incompatible licenses.@@@@1@20@@danf@17-8-2009 10311860@unknown@formal@none@1@S@Sun Microsystems' ZFS, because it is licensed under the GPL-incompatible CDDL and covered by several Sun patents, cannot link to the GPL-licensed linux kernel.@@@@1@24@@danf@17-8-2009 10311870@unknown@formal@none@1@S@Some have also argued that the GPL could, and should, be shorter.@@@@1@12@@danf@17-8-2009 10320010@unknown@formal@none@1@S@
Google
@@@@1@1@@danf@17-8-2009 10320020@unknown@formal@none@1@S@'''Google Inc.''' ( and ) is an [[United States|American]] [[public company|public corporation]], earning revenue from [[AdWords|advertising]] related to its [[Google search|Internet search]], [[Gmail|web-based e-mail]], [[Google Maps|online mapping]], [[Google Apps|office productivity]], [[Orkut|social networking]], and [[YouTube|video sharing]] services as well as selling advertising-free versions of the [[Google Search Appliance|same technologies]].@@@@1@48@@danf@17-8-2009 10320030@unknown@formal@none@1@S@Google's headquarters, the [[Googleplex]], is located in [[Mountain View, California]].@@@@1@10@@danf@17-8-2009 10320040@unknown@formal@none@1@S@As of [[June 30]] [[2008]] the company has 19,604 full-time employees.@@@@1@11@@danf@17-8-2009 10320050@unknown@formal@none@1@S@As of [[October 31]], [[2007]], it is the largest American company (by [[market capitalization]]) that is not part of the [[Dow Jones Industrial Average]].@@@@1@24@@danf@17-8-2009 10320060@unknown@formal@none@1@S@Google was co-founded by [[Larry Page]] and [[Sergey Brin]] while they were students at [[Stanford University]] and the company was first incorporated as a [[privately held company]] on [[September 7]], [[1998]].@@@@1@31@@danf@17-8-2009 10320070@unknown@formal@none@1@S@Google's [[initial public offering]] took place on [[August 19]], [[2004]], raising [[United States dollar|US$]]1.67 billion, making it worth US$23 billion.@@@@1@20@@danf@17-8-2009 10320080@unknown@formal@none@1@S@Google has continued its growth through a series of new product developments, [[List of Google acquisitions|acquisitions]], and [[Google#Partnerships|partnerships]].@@@@1@18@@danf@17-8-2009 10320090@unknown@formal@none@1@S@[[Google#Environmentalism|Environmentalism]], [[Google.org|philanthropy]], and [[Google#Corporate affairs and culture|positive employee relations]] have been important tenets during Google's growth, the latter resulting in being identified multiple times as [[Fortune Magazine|Fortune Magazine's]] #1 Best Place to Work.@@@@1@33@@danf@17-8-2009 10320100@unknown@formal@none@1@S@The company's unofficial slogan is "[[Don't be evil]]", although [[criticism of Google]] include concerns regarding the [[privacy]] of personal information, [[copyright]], [[censorship by Google|censorship]], and discontinuation of services.@@@@1@28@@danf@17-8-2009 10320110@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10320120@unknown@formal@none@1@S@Google began in January 1996, as a research project by [[Larry Page]], who was soon joined by [[Sergey Brin]], two [[Doctor of Philosophy|Ph.D.]] students at [[Stanford University]] in [[California]].@@@@1@29@@danf@17-8-2009 10320130@unknown@formal@none@1@S@They hypothesized that a search engine that analyzed the relationships between websites would produce better ranking of results than existing techniques, which ranked results according to the number of times the search term appeared on a page.@@@@1@37@@danf@17-8-2009 10320140@unknown@formal@none@1@S@Their search engine was originally nicknamed "BackRub" because the system checked [[backlinks]] to estimate a site's importance.@@@@1@17@@danf@17-8-2009 10320150@unknown@formal@none@1@S@A small search engine called Rankdex was already exploring a similar strategy.@@@@1@12@@danf@17-8-2009 10320160@unknown@formal@none@1@S@Convinced that the pages with the most links to them from other highly relevant web pages must be the most relevant pages associated with the search, Page and Brin tested their thesis as part of their studies, and laid the foundation for their search engine.@@@@1@45@@danf@17-8-2009 10320170@unknown@formal@none@1@S@Originally, the search engine used the [[Stanford University]] website with the domain ''google.stanford.edu''.@@@@1@13@@danf@17-8-2009 10320180@unknown@formal@none@1@S@The domain ''google.com'' was registered on [[September 15]], [[1997]], and the company was incorporated as ''Google Inc.'' on [[September 7]], [[1998]] at a friend's garage in [[Menlo Park, California]].@@@@1@29@@danf@17-8-2009 10320190@unknown@formal@none@1@S@The total initial investment raised for the new company amounted to almost US$1.1 million, including a US$100,000 check by [[Andy Bechtolsheim]], one of the founders of [[Sun Microsystems]].@@@@1@28@@danf@17-8-2009 10320200@unknown@formal@none@1@S@In March 1999, the company moved into offices in [[Palo Alto, California|Palo Alto]], home to several other noted [[Silicon Valley]] technology startups.@@@@1@22@@danf@17-8-2009 10320210@unknown@formal@none@1@S@After quickly outgrowing two other sites, the company leased a complex of buildings in [[Mountain View, Santa Clara County, California|Mountain View]] at 1600 Amphitheatre Parkway from [[Silicon Graphics]] (SGI) in 2003.@@@@1@31@@danf@17-8-2009 10320220@unknown@formal@none@1@S@The company has remained at this location ever since, and the complex has since come to be known as the [[Googleplex]] (a play on the word [[googolplex]]).@@@@1@27@@danf@17-8-2009 10320230@unknown@formal@none@1@S@In 2006, Google bought the property from SGI for US$319 million.@@@@1@11@@danf@17-8-2009 10320240@unknown@formal@none@1@S@The Google search engine attracted a loyal following among the growing number of Internet users, who liked its simple design and usability.@@@@1@22@@danf@17-8-2009 10320250@unknown@formal@none@1@S@In 2000, Google began selling [[advertising|advertisements]] associated with search [[keyword (internet search)|keywords]].@@@@1@12@@danf@17-8-2009 10320260@unknown@formal@none@1@S@The ads were text-based to maintain an uncluttered page design and to maximize page loading speed.@@@@1@16@@danf@17-8-2009 10320270@unknown@formal@none@1@S@Keywords were sold based on a combination of price bid and clickthroughs, with bidding starting at US$.05 per click.@@@@1@19@@danf@17-8-2009 10320280@unknown@formal@none@1@S@This model of selling keyword advertising was pioneered by [[Yahoo! Search Marketing|Goto.com]] (later renamed Overture Services, before being acquired by [[Yahoo!]] and rebranded as [[Yahoo! Search Marketing]]).@@@@1@27@@danf@17-8-2009 10320290@unknown@formal@none@1@S@While many of its [[dot-com]] rivals failed in the new Internet marketplace, Google quietly rose in stature while generating revenue.@@@@1@20@@danf@17-8-2009 10320300@unknown@formal@none@1@S@The name "Google" originated from a common misspelling of the word "[[googol]]", which refers to 10100, the number represented by a 1 followed by one hundred zeros.@@@@1@27@@danf@17-8-2009 10320310@unknown@formal@none@1@S@Having found its way increasingly into everyday language, the verb "[[google (verb)|google]]", was added to the ''[[Merriam-Webster|Merriam Webster Collegiate Dictionary]]'' and the ''[[Oxford English Dictionary]]'' in 2006, meaning "to use the Google search engine to obtain information on the Internet."@@@@1@40@@danf@17-8-2009 10320320@unknown@formal@none@1@S@A [[patent]] describing part of Google's ranking mechanism ([[PageRank]]) was granted on [[September 4]], [[2001]].@@@@1@15@@danf@17-8-2009 10320330@unknown@formal@none@1@S@The patent was officially assigned to Stanford University and lists Lawrence Page as the inventor.@@@@1@15@@danf@17-8-2009 10320340@unknown@formal@none@1@S@===Financing and initial public offering===@@@@1@5@@danf@17-8-2009 10320350@unknown@formal@none@1@S@The first funding for Google as a company was secured in 1998, in the form of a US$100,000 contribution from [[Andy Bechtolsheim]], co-founder of [[Sun Microsystems]], given to a corporation which did not yet exist.@@@@1@35@@danf@17-8-2009 10320360@unknown@formal@none@1@S@Around six months later, a much larger round of funding was announced, with the major investors being rival venture capital firms [[Kleiner Perkins Caufield & Byers]] and [[Sequoia Capital]].@@@@1@29@@danf@17-8-2009 10320370@unknown@formal@none@1@S@Google's [[IPO]] took place on [[August 19]], [[2004]].@@@@1@8@@danf@17-8-2009 10320380@unknown@formal@none@1@S@19,605,052 [[stock|shares]] were offered at a price of US$85 per share.@@@@1@11@@danf@17-8-2009 10320390@unknown@formal@none@1@S@Of that, 14,142,135 (another mathematical reference as [[square root of two|√2]] ≈ 1.4142135) were floated by Google, and the remaining 5,462,917 were offered by existing stockholders.@@@@1@26@@danf@17-8-2009 10320400@unknown@formal@none@1@S@The sale of US$1.67 billion gave Google a [[market capitalization]] of more than US$23 billion.@@@@1@15@@danf@17-8-2009 10320410@unknown@formal@none@1@S@The vast majority of Google's 271 million shares remained under Google's control.@@@@1@12@@danf@17-8-2009 10320420@unknown@formal@none@1@S@Many of Google's employees became instant [[paper millionaires]].@@@@1@8@@danf@17-8-2009 10320430@unknown@formal@none@1@S@[[Yahoo!]], a competitor of Google, also benefited from the IPO because it owned 8.4 million shares of Google as of [[August 9]], [[2004]], ten days before the IPO.@@@@1@28@@danf@17-8-2009 10320440@unknown@formal@none@1@S@Google's stock performance after its first IPO launch has gone well, with shares hitting US$700 for the first time on [[October 31]], [[2007]], due to strong sales and earnings in the advertising market, as well as the release of new features such as the [[Google Desktop|desktop search function]] and its iGoogle personalized home page.@@@@1@54@@danf@17-8-2009 10320450@unknown@formal@none@1@S@The surge in stock price is fueled primarily by individual investors, as opposed to large institutional investors and [[mutual fund]]s.@@@@1@20@@danf@17-8-2009 10320460@unknown@formal@none@1@S@The company is listed on the [[NASDAQ]] stock exchange under the [[ticker]] symbol '''GOOG''' and under the [[London Stock Exchange]] under the ticker symbol '''GGEA'''.@@@@1@25@@danf@17-8-2009 10320470@unknown@formal@none@1@S@===Growth===@@@@1@1@@danf@17-8-2009 10320480@unknown@formal@none@1@S@While the company's primary business interest is in the web content arena, Google has begun experimenting with other markets, such as [[radio]] and print publications.@@@@1@25@@danf@17-8-2009 10320490@unknown@formal@none@1@S@On [[January 17]], [[2006]], Google announced that its purchase of a radio advertising company "dMarc", which provides an automated system that allows companies to advertise on the radio.@@@@1@28@@danf@17-8-2009 10320500@unknown@formal@none@1@S@This will allow Google to combine two niche advertising media—the Internet and radio—with Google's ability to laser-focus on the tastes of consumers.@@@@1@22@@danf@17-8-2009 10320510@unknown@formal@none@1@S@Google has also begun an experiment in selling advertisements from its advertisers in offline newspapers and magazines, with select advertisements in the [[Chicago Sun-Times]].@@@@1@24@@danf@17-8-2009 10320520@unknown@formal@none@1@S@They have been filling unsold space in the newspaper that would have normally been used for in-house advertisements.@@@@1@18@@danf@17-8-2009 10320530@unknown@formal@none@1@S@Google was added to the [[S&P 500 index]] on [[March 30]], [[2006]].@@@@1@12@@danf@17-8-2009 10320540@unknown@formal@none@1@S@It replaced [[Burlington Resources]], a major oil producer based in [[Houston]] which was acquired by [[ConocoPhillips]].@@@@1@16@@danf@17-8-2009 10320550@unknown@formal@none@1@S@===Acquisitions===@@@@1@1@@danf@17-8-2009 10320560@unknown@formal@none@1@S@Since 2001, Google has acquired several small start-up companies, often consisting of innovative teams and products.@@@@1@16@@danf@17-8-2009 10320570@unknown@formal@none@1@S@One of the earlier companies that Google bought was [[Pyra Labs]].@@@@1@11@@danf@17-8-2009 10320580@unknown@formal@none@1@S@They were the creators of [[Blogger (service)|Blogger]], a weblog publishing platform, first launched in 1999.@@@@1@15@@danf@17-8-2009 10320590@unknown@formal@none@1@S@This acquisition led to many premium features becoming free.@@@@1@9@@danf@17-8-2009 10320600@unknown@formal@none@1@S@Pyra Labs was originally formed by [[Evan Williams (blogger)|Evan Williams]], yet he left Google in 2004.@@@@1@16@@danf@17-8-2009 10320610@unknown@formal@none@1@S@In early 2006, Google acquired Upstartle, a company responsible for the online word processor, [[Writely]].@@@@1@15@@danf@17-8-2009 10320620@unknown@formal@none@1@S@The technology in this product was used by Google to eventually create [[Google Docs & Spreadsheets]].@@@@1@16@@danf@17-8-2009 10320630@unknown@formal@none@1@S@In 2004, Google acquired a company called [[Keyhole, Inc.]], which developed a product called ''Earth Viewer'' which was renamed in 2005 to [[Google Earth]].@@@@1@24@@danf@17-8-2009 10320640@unknown@formal@none@1@S@In February 2006, software company Adaptive Path sold Measure Map, a [[weblog]] statistics application, to Google.@@@@1@16@@danf@17-8-2009 10320650@unknown@formal@none@1@S@Registration to the service has since been temporarily disabled.@@@@1@9@@danf@17-8-2009 10320660@unknown@formal@none@1@S@The last update regarding the future of Measure Map was made on [[April 6]], [[2006]] and outlined many of the service's known issues.@@@@1@23@@danf@17-8-2009 10320670@unknown@formal@none@1@S@In late 2006, Google bought online video site [[YouTube]] for US$1.65 billion in stock.@@@@1@14@@danf@17-8-2009 10320680@unknown@formal@none@1@S@Shortly after, on [[October 31]], [[2006]], Google announced that it had also acquired [[JotSpot]], a developer of wiki technology for collaborative Web sites.@@@@1@23@@danf@17-8-2009 10320690@unknown@formal@none@1@S@On [[April 13]], [[2007]], Google reached an agreement to acquire [[DoubleClick]].@@@@1@11@@danf@17-8-2009 10320700@unknown@formal@none@1@S@Google agreed to buy the company for US$3.1 billion.@@@@1@9@@danf@17-8-2009 10320710@unknown@formal@none@1@S@On [[July 9]], [[2007]], Google announced that it had signed a definitive agreement to acquire enterprise messaging security and compliance company [[Postini]].@@@@1@22@@danf@17-8-2009 10320720@unknown@formal@none@1@S@===Partnerships===@@@@1@1@@danf@17-8-2009 10320730@unknown@formal@none@1@S@In 2005, Google entered into partnerships with other companies and government agencies to improve production and services.@@@@1@17@@danf@17-8-2009 10320740@unknown@formal@none@1@S@Google announced a partnership with [[NASA Ames Research Center]] to build up of offices and work on research projects involving large-scale data management, [[nanotechnology]], [[distributed computing]], and the entrepreneurial space industry.@@@@1@31@@danf@17-8-2009 10320750@unknown@formal@none@1@S@Google also entered into a partnership with [[Sun Microsystems]] in October to help share and distribute each other's technologies.@@@@1@19@@danf@17-8-2009 10320760@unknown@formal@none@1@S@The company entered into a partnership with [[Time Warner]]'s [[AOL]], to enhance each other's video search services.@@@@1@17@@danf@17-8-2009 10320770@unknown@formal@none@1@S@The same year, the company became a major financial investor of the new [[.mobi]] [[top-level domain]] for mobile devices, in conjunction with several other companies, including [[Microsoft]], [[Nokia]], and [[Ericsson]] among others.@@@@1@32@@danf@17-8-2009 10320780@unknown@formal@none@1@S@In September 2007, Google launched, "Adsense for Mobile", a service for its publishing partners which provides the ability to monetize their mobile websites through the targeted placement of mobile text ads, and acquired the mobile social networking site, ''Zingku.mobi'', to "provide people worldwide with direct access to Google applications, and ultimately the information they want and need, right from their mobile devices."@@@@1@62@@danf@17-8-2009 10320790@unknown@formal@none@1@S@In 2006, Google and [[News Corporation|News Corp.]]'s Fox Interactive Media entered into a US$900 million agreement to provide search and advertising on the popular social networking site, [[MySpace]].@@@@1@28@@danf@17-8-2009 10320800@unknown@formal@none@1@S@On November 5, 2007 Google announced the [[Open Handset Alliance]] to develop an open platform for mobile services called [[Google Android|Android]].@@@@1@21@@danf@17-8-2009 10320810@unknown@formal@none@1@S@On March,2008 Google, [[Sprint]], [[Intel]], [[Comcast]], [[Time Warner Cable]],[[Bright House Networks]],[[Clearwire]] together found [[Xohm]] to provide wireless [[telecommunication]] service.@@@@1@19@@danf@17-8-2009 10320820@unknown@formal@none@1@S@==Products and services==@@@@1@3@@danf@17-8-2009 10320830@unknown@formal@none@1@S@Google has created services and tools for the general public and business environment alike; including Web applications, advertising networks and solutions for businesses.@@@@1@23@@danf@17-8-2009 10320840@unknown@formal@none@1@S@===Advertising===@@@@1@1@@danf@17-8-2009 10320850@unknown@formal@none@1@S@Most of Google's revenue is derived from advertising programs.@@@@1@9@@danf@17-8-2009 10320860@unknown@formal@none@1@S@For the 2006 fiscal year, the company reported US$10.492 billion in total advertising revenues and only US$112 million in licensing and other revenues.@@@@1@23@@danf@17-8-2009 10320870@unknown@formal@none@1@S@Google [[AdWords]] allows Web advertisers to display advertisements in Google's search results and the Google Content Network, through either a cost-per-click or cost-per-view scheme.@@@@1@24@@danf@17-8-2009 10320880@unknown@formal@none@1@S@Google [[AdSense]] website owners can also display adverts on their own site, and earn money every time ads are clicked.@@@@1@20@@danf@17-8-2009 10320890@unknown@formal@none@1@S@===Web-based software===@@@@1@2@@danf@17-8-2009 10320900@unknown@formal@none@1@S@The [[Google search|Google web search engine]] is the company's most popular service.@@@@1@12@@danf@17-8-2009 10320910@unknown@formal@none@1@S@As of August 2007, Google is the most used [[search engine]] on the web with a 53.6% market share, ahead of [[Yahoo!]] (19.9%) and [[Live Search]] (12.9%).@@@@1@27@@danf@17-8-2009 10320920@unknown@formal@none@1@S@Google indexes billions of Web pages, so that users can search for the information they desire, through the use of [[keyword (Internet search)|keywords]] and [[operators]].@@@@1@25@@danf@17-8-2009 10320930@unknown@formal@none@1@S@Google has also employed the Web Search technology into other search services, including Image Search, [[Google News]], the price comparison site [[Google Product Search]], the interactive [[Usenet]] archive [[Google Groups]], [[Google Maps]], and more.@@@@1@34@@danf@17-8-2009 10320940@unknown@formal@none@1@S@In 2004, Google launched its own free web-based e-mail service, known as [[Gmail]] (or Google Mail in some jurisdictions).@@@@1@19@@danf@17-8-2009 10320950@unknown@formal@none@1@S@Gmail features [[e-mail filtering|spam-filtering technology]] and the capability to use Google technology to search e-mail.@@@@1@15@@danf@17-8-2009 10320960@unknown@formal@none@1@S@The service generates revenue by displaying advertisements and links from the [[AdWords]] service that are tailored to the choice of the user and/or content of the e-mail messages displayed on screen.@@@@1@31@@danf@17-8-2009 10320970@unknown@formal@none@1@S@In early 2006, the company launched [[Google Video]], which not only allows users to search and view freely available videos but also offers users and media publishers the ability to publish their content, including television shows on [[CBS]], [[NBA]] basketball games, and music videos.@@@@1@44@@danf@17-8-2009 10320980@unknown@formal@none@1@S@In August 2007, Google announced that it would shut down its video rental and sale program and offer refunds and [[Google Checkout]] credits to consumers who had purchased videos to own.@@@@1@31@@danf@17-8-2009 10320990@unknown@formal@none@1@S@On [[February 28]], [[2008]] Google launched the [[Google Sites]] [[wiki]] as a [[Google Apps]] component.@@@@1@15@@danf@17-8-2009 10321000@unknown@formal@none@1@S@Google has also developed several desktop applications, including [[Google Earth]], an interactive mapping program powered by satellite and aerial imagery that covers the vast majority of the planet.@@@@1@28@@danf@17-8-2009 10321010@unknown@formal@none@1@S@Google Earth is generally considered to be remarkably accurate and extremely detailed.@@@@1@12@@danf@17-8-2009 10321020@unknown@formal@none@1@S@Many major cities have such detailed images that one can zoom in close enough to see vehicles and pedestrians clearly.@@@@1@20@@danf@17-8-2009 10321030@unknown@formal@none@1@S@Consequently, there have been some concerns about national security implications.@@@@1@10@@danf@17-8-2009 10321040@unknown@formal@none@1@S@Specifically, some countries and militaries contend the software can be used to pinpoint with near-precision accuracy the physical location of critical infrastructure, commercial and residential buildings, bases, government agencies, and so on.@@@@1@32@@danf@17-8-2009 10321050@unknown@formal@none@1@S@However, the satellite images are not necessarily frequently updated, and all of them are available at no charge through other products and even government sources.@@@@1@25@@danf@17-8-2009 10321060@unknown@formal@none@1@S@For example, [[NASA]] and the [[NGA|National Geospatial-Intelligence Agency]].@@@@1@8@@danf@17-8-2009 10321070@unknown@formal@none@1@S@Some counter this argument by stating that Google Earth makes it easier to access and research the images.@@@@1@18@@danf@17-8-2009 10321080@unknown@formal@none@1@S@Many other products are available through [[Google Labs]], which is a collection of incomplete applications that are still being tested for use by the general public.@@@@1@26@@danf@17-8-2009 10321090@unknown@formal@none@1@S@Google has promoted their products in various ways.@@@@1@8@@danf@17-8-2009 10321100@unknown@formal@none@1@S@In [[London]], ''Google Space'' was set-up in [[Heathrow Airport]], showcasing several products, including Gmail, Google Earth and Picasa.@@@@1@18@@danf@17-8-2009 10321110@unknown@formal@none@1@S@Also, a similar page was launched for American college students, under the name ''College Life, Powered by Google.''@@@@1@18@@danf@17-8-2009 10321120@unknown@formal@none@1@S@In 2007, some reports surfaced that Google was planning the release of its own mobile phone, possibly a competitor to [[Apple Inc.|Apple]]'s [[iPhone]].@@@@1@23@@danf@17-8-2009 10321130@unknown@formal@none@1@S@The project, called [[Android (mobile phone platform)|Android]] provides a standard development kit that will allow any "Android" phone to run software developed for the Android SDK, no matter the phone manufacturer.@@@@1@31@@danf@17-8-2009 10321140@unknown@formal@none@1@S@In October 2007, Google SMS service was launched in [[India]] allowing users to get business listings, movie showtimes, and information by sending an [[SMS]].@@@@1@24@@danf@17-8-2009 10321150@unknown@formal@none@1@S@===Enterprise products===@@@@1@2@@danf@17-8-2009 10321160@unknown@formal@none@1@S@In 2007, Google launched [[Google Apps|Google Apps Premier Edition]], a version of Google Apps targeted primarily at the business user.@@@@1@20@@danf@17-8-2009 10321170@unknown@formal@none@1@S@It includes such extras as more disk space for e-mail, API access, and premium support, for a price of US$50 per user per year.@@@@1@24@@danf@17-8-2009 10321180@unknown@formal@none@1@S@A large implementation of Google Apps with 38,000 users is at [[Lakehead University]] in [[Thunder Bay, Ontario|Thunder Bay]], Ontario, Canada.@@@@1@20@@danf@17-8-2009 10321190@unknown@formal@none@1@S@==Platform==@@@@1@1@@danf@17-8-2009 10321200@unknown@formal@none@1@S@Google runs its services on several [[server farm]]s, each comprising thousands of low-cost commodity computers running stripped-down versions of [[Linux]].@@@@1@20@@danf@17-8-2009 10321210@unknown@formal@none@1@S@While the company divulges no details of its hardware, a 2006 estimate cites 450,000 servers, "racked up in clusters at data centers around the world."@@@@1@25@@danf@17-8-2009 10321220@unknown@formal@none@1@S@==Corporate affairs and culture==@@@@1@4@@danf@17-8-2009 10321230@unknown@formal@none@1@S@Google is known for its relaxed corporate culture, of which its playful variations on [[Google logo#History of the Google Doodle|its own corporate logo]] are an indicator.@@@@1@26@@danf@17-8-2009 10321240@unknown@formal@none@1@S@In 2007 and 2008, ''[[Fortune Magazine]]'' placed Google at the top of its list of the hundred best places to work.@@@@1@21@@danf@17-8-2009 10321250@unknown@formal@none@1@S@Google's corporate philosophy embodies such casual principles as "you can make money without doing evil," "you can be serious without a suit," and "work should be challenging and the challenge should be fun."@@@@1@33@@danf@17-8-2009 10321260@unknown@formal@none@1@S@Google has been criticized for having salaries below industry standards.@@@@1@10@@danf@17-8-2009 10321270@unknown@formal@none@1@S@For example, some [[system administrator]]s earn no more than US$35,000 per year – considered to be quite low for the [[San Francisco Bay Area|Bay Area]] job market.@@@@1@27@@danf@17-8-2009 10321280@unknown@formal@none@1@S@However, Google's stock performance following its [[Initial public offering|IPO]] has enabled many early employees to be competitively compensated by participation in the corporation's remarkable equity growth.@@@@1@26@@danf@17-8-2009 10321290@unknown@formal@none@1@S@Google implemented other employee incentives in 2005, such as the [[Google Founders' Award]], in addition to offering higher salaries to new employees.@@@@1@22@@danf@17-8-2009 10321300@unknown@formal@none@1@S@Google's workplace amenities, culture, global popularity, and strong brand recognition have also attracted potential applicants.@@@@1@15@@danf@17-8-2009 10321310@unknown@formal@none@1@S@After the company's [[IPO]] in August 2004, it was reported that founders [[Sergey Brin]] and [[Larry Page]], and CEO [[Eric E. Schmidt|Eric Schmidt]], requested that their base salary be cut to US$1.00.@@@@1@32@@danf@17-8-2009 10321320@unknown@formal@none@1@S@Subsequent offers by the company to increase their salaries have been turned down, primarily because, "their primary compensation continues to come from returns on their ownership stakes in Google.@@@@1@29@@danf@17-8-2009 10321330@unknown@formal@none@1@S@As significant stockholders, their personal wealth is tied directly to sustained stock price appreciation and performance, which provides direct alignment with stockholder interests."@@@@1@23@@danf@17-8-2009 10321340@unknown@formal@none@1@S@Prior to 2004, Schmidt was making US$250,000 per year, and Page and Brin each earned a salary of US$150,000.@@@@1@19@@danf@17-8-2009 10321350@unknown@formal@none@1@S@They have all declined recent offers of bonuses and increases in compensation by Google's board of directors.@@@@1@17@@danf@17-8-2009 10321360@unknown@formal@none@1@S@In a 2007 report of the United States' richest people, [[Forbes]] reported that [[Sergey Brin]] and [[Larry Page]] were tied for #5 with a net worth of US$18.5 billion each.@@@@1@30@@danf@17-8-2009 10321370@unknown@formal@none@1@S@In 2007 and through early 2008, Google has seen the departure of several top executives.@@@@1@15@@danf@17-8-2009 10321380@unknown@formal@none@1@S@Justin Rosenstein, Google’s product manager, left in June of 2007.@@@@1@10@@danf@17-8-2009 10321390@unknown@formal@none@1@S@Shortly thereafter, Gideon Yu, former chief financial officer of [[YouTube]], a Google unit, joined [[Facebook]] along with Benjamin Ling, a high-ranking engineer, who left in October 2007.@@@@1@27@@danf@17-8-2009 10321400@unknown@formal@none@1@S@In March 2008, two senior Google leaders announced their desire to pursue other opportunities.@@@@1@14@@danf@17-8-2009 10321410@unknown@formal@none@1@S@Sheryl Sandburg, ex-VP of global online sales and operations began her position as COO of [[Facebook]] while Ash ElDifrawi, former head of brand advertising, left to become CMO of [[Netshops]] Inc.@@@@1@31@@danf@17-8-2009 10321420@unknown@formal@none@1@S@===Googleplex===@@@@1@1@@danf@17-8-2009 10321430@unknown@formal@none@1@S@Google's headquarters in Mountain View, California, is referred to as "the [[Googleplex]]" in a play of words; a [[googolplex]] being 1 followed by a googol of zeros, and the HQ being a [[complex]] of buildings (cf. [[movie theater|multiplex]], cineplex, etc).@@@@1@40@@danf@17-8-2009 10321440@unknown@formal@none@1@S@The lobby is decorated with a [[piano]], [[lava lamps]], old server clusters, and a projection of search queries on the wall.@@@@1@21@@danf@17-8-2009 10321450@unknown@formal@none@1@S@The hallways are full of exercise balls and [[bicycle]]s.@@@@1@9@@danf@17-8-2009 10321460@unknown@formal@none@1@S@Each employee has access to the corporate recreation center.@@@@1@9@@danf@17-8-2009 10321470@unknown@formal@none@1@S@Recreational amenities are scattered throughout the campus and include a workout room with weights and rowing machines, locker rooms, washers and dryers, a massage room, assorted [[video game]]s, [[Foosball]], a [[piano|baby grand piano]], a pool table, and [[ping pong]].@@@@1@39@@danf@17-8-2009 10321480@unknown@formal@none@1@S@In addition to the [[Recreation room|rec room]], there are snack rooms stocked with various foods and drinks.@@@@1@17@@danf@17-8-2009 10321490@unknown@formal@none@1@S@In 2006, Google moved into of office space in [[New York City]], at 111 [[Eighth Avenue|Eighth Ave.]] in Manhattan.@@@@1@19@@danf@17-8-2009 10321500@unknown@formal@none@1@S@The office was specially designed and built for Google and houses its largest advertising sales team, which has been instrumental in securing large partnerships, most recently deals with [[MySpace]] and [[AOL]].@@@@1@31@@danf@17-8-2009 10321510@unknown@formal@none@1@S@In 2003, they added an engineering staff in New York City, which has been responsible for more than 100 engineering projects, including [[Google Maps]], [[Google Spreadsheet]]s, and others.@@@@1@28@@danf@17-8-2009 10321520@unknown@formal@none@1@S@It is estimated that the building costs Google US$10 million per year to rent and is similar in design and functionality to its [[Mountain View, California|Mountain View]] headquarters, including [[foosball]], [[air hockey]], and ping-pong tables, as well as a video game area.@@@@1@42@@danf@17-8-2009 10321530@unknown@formal@none@1@S@In November 2006, Google opened offices on [[Carnegie Mellon]]'s campus in [[Pittsburgh, Pennsylvania|Pittsburgh]].@@@@1@13@@danf@17-8-2009 10321540@unknown@formal@none@1@S@By late 2006, Google also established a new headquarters for its AdWords division in [[Ann Arbor, Michigan]].@@@@1@17@@danf@17-8-2009 10321550@unknown@formal@none@1@S@The size of Google's search system is presently undisclosed.@@@@1@9@@danf@17-8-2009 10321560@unknown@formal@none@1@S@The best estimates place the total number of the company's servers at 450,000, spread over twenty five locations throughout the world, including major [[network operations center|operations centers]] in [[Dublin]] (European Operations [[Headquarters]]) and [[Atlanta, Georgia]].@@@@1@35@@danf@17-8-2009 10321570@unknown@formal@none@1@S@Google is also in the process of constructing a major operations center in [[The Dalles, Oregon]], on the banks of the [[Columbia River]].@@@@1@23@@danf@17-8-2009 10321580@unknown@formal@none@1@S@The site, also referred to by the media as ''Project 02'', was chosen due to the availability of inexpensive [[hydroelectric power]] and a large surplus of [[fiber optic]] cable, remnants of the dot com boom of the late 1990s.@@@@1@39@@danf@17-8-2009 10321590@unknown@formal@none@1@S@The computing center is estimated to be the size of two [[American football|football fields]], and it has created hundreds of construction jobs, causing local real estate prices to increase 40%.@@@@1@30@@danf@17-8-2009 10321600@unknown@formal@none@1@S@Upon completion, the center is expected to create 60 to 200 permanent jobs in the town of 12,000 people.@@@@1@19@@danf@17-8-2009 10321610@unknown@formal@none@1@S@Google is taking steps to ensure that their operations are environmentally sound.@@@@1@12@@danf@17-8-2009 10321620@unknown@formal@none@1@S@In October 2006, the company announced plans to install thousands of [[Photovoltaic module|solar panels]] to provide up to 1.6 [[megawatt]]s of [[electricity]], enough to satisfy approximately 30% of the campus' energy needs.@@@@1@31@@danf@17-8-2009 10321630@unknown@formal@none@1@S@The system will be the largest solar power system constructed on a [[United States|U.S.]] corporate campus and one of the largest on any corporate site in the world.@@@@1@28@@danf@17-8-2009 10321640@unknown@formal@none@1@S@In June 2007, Google announced that they plan to become [[carbon neutral]] by 2008, which includes investing in energy efficiency, renewable energy sources, and purchasing carbon offsets, such as investing in projects like capturing and burning [[methane]] from animal waste at Mexican and Brazilian farms.@@@@1@45@@danf@17-8-2009 10321650@unknown@formal@none@1@S@===Innovation time off===@@@@1@3@@danf@17-8-2009 10321660@unknown@formal@none@1@S@As an interesting motivation technique (usually called [[ITO|Innovation Time Off]]), all Google engineers are encouraged to spend 20% of their work time (one day per week) on projects that interest them.@@@@1@31@@danf@17-8-2009 10321670@unknown@formal@none@1@S@Some of Google's newer services, such as [[Gmail]], [[Google News]], [[Orkut]], and [[AdSense]] originated from these independent endeavors.@@@@1@18@@danf@17-8-2009 10321680@unknown@formal@none@1@S@In a talk at [[Stanford University]], [[Marissa Mayer]], Google's Vice President of Search Products and User Experience, stated that her analysis showed that half of the new product launches originated from the 20% time.@@@@1@34@@danf@17-8-2009 10321690@unknown@formal@none@1@S@===Easter eggs and April Fool's Day jokes===@@@@1@7@@danf@17-8-2009 10321700@unknown@formal@none@1@S@Google has a tradition of creating [[April Fool's Day]] jokes—such as [[Google's hoaxes#2000|Google MentalPlex]], which allegedly featured the use of mental power to search the web.@@@@1@26@@danf@17-8-2009 10321710@unknown@formal@none@1@S@In 2002, they claimed that [[pigeons]] were the [[Google's hoaxes#2002: Pigeon Rank|secret]] behind their growing [[search engine]].@@@@1@17@@danf@17-8-2009 10321720@unknown@formal@none@1@S@In 2004, they featured [[Google's hoaxes#2004: Google Lunar/Copernicus Center|Google Lunar]] (which claimed to feature jobs on the [[moon]]), and in 2005, a [[fiction|fictitious]] brain-boosting drink, termed [[Google's hoaxes#2005: Google Gulp|Google Gulp]] was announced.@@@@1@33@@danf@17-8-2009 10321730@unknown@formal@none@1@S@In 2006, they came up with [[Google's hoaxes#2006: Google Romance|Google Romance]], a hypothetical [[online dating]] service.@@@@1@16@@danf@17-8-2009 10321740@unknown@formal@none@1@S@In 2007, Google announced two joke products.@@@@1@7@@danf@17-8-2009 10321750@unknown@formal@none@1@S@The first was a free wireless Internet service called [[TiSP]] (Toilet Internet Service Provider) in which one obtained a connection by flushing one end of a [[fiber-optic]] cable down their toilet and waiting only an hour for a "Plumbing Hardware Dispatcher (PHD)" to connect it to the Internet.@@@@1@48@@danf@17-8-2009 10321760@unknown@formal@none@1@S@Additionally, Google's [[Gmail]] page displayed an announcement for [[Gmail Paper]], which allows users of their free email service to have email messages printed and shipped to a snail mail address.@@@@1@30@@danf@17-8-2009 10321770@unknown@formal@none@1@S@Google's services contain a number of [[Easter egg (virtual)|Easter eggs]]; for instance, the Language Tools page offers the search interface in the [[Swedish Chef]]'s "Bork bork bork," [[Pig Latin]], ”Hacker” (actually [[leetspeak]]), [[Elmer Fudd]], and [[Klingon language|Klingon]].@@@@1@37@@danf@17-8-2009 10321780@unknown@formal@none@1@S@In addition, the search engine calculator provides the [[Answer to Life, the Universe, and Everything]] from [[Douglas Adams]]' ''[[The Hitchhiker's Guide to the Galaxy]]''.@@@@1@24@@danf@17-8-2009 10321790@unknown@formal@none@1@S@As Google's search box can be used as a unit converter (as well as a calculator), some non-standard units are built in, such as the [[Smoot]].@@@@1@26@@danf@17-8-2009 10321800@unknown@formal@none@1@S@Google also routinely modifies its logo in accordance with various holidays or special events throughout the year, such as [[Christmas]], [[Mother's Day]], or the [[birthday]]s of various notable individuals.@@@@1@29@@danf@17-8-2009 10321810@unknown@formal@none@1@S@===IPO and culture===@@@@1@3@@danf@17-8-2009 10321820@unknown@formal@none@1@S@Many people speculated that Google's [[initial public offering|IPO]] would inevitably lead to changes in the company's culture, because of shareholder pressure for employee benefit reductions and short-term advances, or because a large number of the company's employees would suddenly become millionaires on paper.@@@@1@43@@danf@17-8-2009 10321830@unknown@formal@none@1@S@In a report given to potential investors, co-founders Sergey Brin and Larry Page promised that the IPO would not change the company's culture.@@@@1@23@@danf@17-8-2009 10321840@unknown@formal@none@1@S@Later Mr. Page said, "We think a lot about how to maintain our culture and the fun elements.@@@@1@18@@danf@17-8-2009 10321850@unknown@formal@none@1@S@We spent a lot of time getting our offices right.@@@@1@10@@danf@17-8-2009 10321860@unknown@formal@none@1@S@We think it's important to have a high density of people.@@@@1@11@@danf@17-8-2009 10321870@unknown@formal@none@1@S@People are packed together everywhere.@@@@1@5@@danf@17-8-2009 10321880@unknown@formal@none@1@S@We all share offices.@@@@1@4@@danf@17-8-2009 10321890@unknown@formal@none@1@S@We like this set of buildings because it's more like a densely packed university campus than a typical suburban office park."@@@@1@21@@danf@17-8-2009 10321900@unknown@formal@none@1@S@However, many analysts are finding that as Google grows, the company is becoming more "corporate".@@@@1@15@@danf@17-8-2009 10321910@unknown@formal@none@1@S@In 2005, articles in ''[[The New York Times]]'' and other sources began suggesting that Google had lost its anti-corporate, no evil philosophy.@@@@1@22@@danf@17-8-2009 10321920@unknown@formal@none@1@S@In an effort to maintain the company's unique culture, Google has designated a Chief Culture Officer in 2006, who also serves as the Director of Human Resources.@@@@1@27@@danf@17-8-2009 10321930@unknown@formal@none@1@S@The purpose of the Chief Culture Officer is to develop and maintain the culture and work on ways to keep true to the core values that the company was founded on in the beginning—a flat organization, a lack of hierarchy, a collaborative environment.@@@@1@43@@danf@17-8-2009 10321940@unknown@formal@none@1@S@===Philanthropy===@@@@1@1@@danf@17-8-2009 10321950@unknown@formal@none@1@S@In 2004, Google formed a for-profit philanthropic wing, [[Google.org]], with a start-up fund of US$1 billion.@@@@1@16@@danf@17-8-2009 10321960@unknown@formal@none@1@S@The express mission of the organization is to create awareness about [[climate change]], global public health, and [[global poverty]].@@@@1@19@@danf@17-8-2009 10321970@unknown@formal@none@1@S@One of its first projects is to develop a viable [[plug-in hybrid]] [[electric vehicle]] that can attain 100 [[fuel economy in automobiles|mpg]].@@@@1@22@@danf@17-8-2009 10321980@unknown@formal@none@1@S@The founding and current director is Dr. [[Larry Brilliant]].@@@@1@9@@danf@17-8-2009 10321990@unknown@formal@none@1@S@==Criticism==@@@@1@1@@danf@17-8-2009 10322000@unknown@formal@none@1@S@As it has grown, Google has found itself the focus of several controversies related to its business practices and services.@@@@1@20@@danf@17-8-2009 10322010@unknown@formal@none@1@S@For example, [[Google Book Search]]'s effort to digitize millions of books and make the full text searchable has led to [[copyright]] disputes with the [[Authors Guild]].@@@@1@26@@danf@17-8-2009 10322020@unknown@formal@none@1@S@Google's cooperation with the governments of [[People's Republic of China|China]], and to a lesser extent [[France]] and [[Germany]] (regarding [[Holocaust denial]]) to filter search results in accordance to regional laws and regulations has led to claims of [[censorship by Google|censorship]].@@@@1@40@@danf@17-8-2009 10322030@unknown@formal@none@1@S@Google's persistent [[HTTP cookie|cookie]] and other information collection practices have led to concerns over user [[Google and privacy issues|privacy]].@@@@1@19@@danf@17-8-2009 10322040@unknown@formal@none@1@S@As of [[December 11]], [[2007]], Google, like the [[Microsoft]] search engine, stores "personal information for 18 months" and by comparison, [[Yahoo!]] and [[AOL]] ([[Time Warner]]) "retain search requests for 13 months."@@@@1@31@@danf@17-8-2009 10322050@unknown@formal@none@1@S@A number of [[India]]n state governments have raised concerns about the security risks posed by geographic details provided by [[Google Earth]]'s satellite imaging.@@@@1@23@@danf@17-8-2009 10322060@unknown@formal@none@1@S@Google has also been criticized by advertisers regarding its inability to combat [[click fraud]], when a person or automated script is used to generate a charge on an advertisement without really having an interest in the product.@@@@1@37@@danf@17-8-2009 10322070@unknown@formal@none@1@S@Industry reports in 2006 claim that approximately 14 to 20 percent of clicks were in fact fraudulent or invalid.@@@@1@19@@danf@17-8-2009 10322080@unknown@formal@none@1@S@Further, Google has faced allegations of [[sexism]] and [[ageism]] from former employees.@@@@1@12@@danf@17-8-2009 10322090@unknown@formal@none@1@S@Google has also faced accusations in [[Harper's Magazine]] of being extremely excessive with their energy usage, and were accused of employing their "[[Don't be evil]]" motto as well as their very public energy saving campaigns as means of trying to cover up or make up for the massive amounts of energy their servers actually require.@@@@1@55@@danf@17-8-2009 10322100@unknown@formal@none@1@S@Also, US District Court Judge [[Louis Stanton]], on [[July 1]], 2008 ordered Google to give [[YouTube]] user data / log to [[Viacom]] to support its case in a billion-dollar [[copyright]] lawsuit against Google.@@@@1@33@@danf@17-8-2009 10322110@unknown@formal@none@1@S@Google and [[Viacom]], however, on [[July 14]], 2008, agreed in [[compromise]] to protect [[YouTube]] users' personal data in the $ 1 billion (£ 497 million) copyright lawsuit.@@@@1@27@@danf@17-8-2009 10322120@unknown@formal@none@1@S@Google agreed it will make user information and internet protocol addresses from its YouTube subsidiary anonymous before handing over the data to Viacom.@@@@1@23@@danf@17-8-2009 10322130@unknown@formal@none@1@S@The privacy deal also applied to other litigants including the [[FA Premier League]], the Rodgers & Hammerstein Organisation and the [[Scottish Premier League]].@@@@1@23@@danf@17-8-2009 10322140@unknown@formal@none@1@S@The deal however did not extend the anonymity to employees, since Viacom would prove that Google staff are aware of uploading of illegal material to the site.@@@@1@27@@danf@17-8-2009 10322150@unknown@formal@none@1@S@The parties therefore will further meet on the matter lest the data be made available to the court.@@@@1@18@@danf@17-8-2009 10330010@unknown@formal@none@1@S@
Google Translate
@@@@1@2@@danf@17-8-2009 10330020@unknown@formal@none@1@S@'''Google Translate''' is a service provided by [[Google|Google Inc.]] to translate a section of text, or a webpage, into another language, with limits to the number of paragraphs, or range of technical terms, translated.@@@@1@34@@danf@17-8-2009 10330030@unknown@formal@none@1@S@For some languages, users are asked for alternate translations, such as for technical terms, to be included for future updates to the translation process.@@@@1@24@@danf@17-8-2009 10330040@unknown@formal@none@1@S@Unlike other translation services such as [[Babel Fish (website)|Babel Fish]], [[AOL]], and [[Yahoo!|Yahoo]] which use [[SYSTRAN]], Google uses its own translation software.@@@@1@22@@danf@17-8-2009 10330050@unknown@formal@none@1@S@== Functions ==@@@@1@3@@danf@17-8-2009 10330060@unknown@formal@none@1@S@The service also includes translation of an entire Web page.@@@@1@10@@danf@17-8-2009 10330070@unknown@formal@none@1@S@The translation is limited in number of paragraphs per webpage (such as indicated by break-tags <br>); however, if text on a webpage is separated by horizontal blank-line images (auto-wrapped without using any <br>), a long webpage can be translated containing several thousand words.@@@@1@43@@danf@17-8-2009 10330080@unknown@formal@none@1@S@Google Translate, like other automatic translation tools, has its limitations.@@@@1@10@@danf@17-8-2009 10330090@unknown@formal@none@1@S@While it can help the reader to understand the general content of a foreign language text, it does not deliver accurate translations and does not produce publication-standard content, for example it often translates words out of context and is deliberately not applying any [[Grammar|grammatical]] rules.@@@@1@45@@danf@17-8-2009 10330100@unknown@formal@none@1@S@== Approach ==@@@@1@3@@danf@17-8-2009 10330110@unknown@formal@none@1@S@Google translate is based on an approach called [[statistical machine translation]], and more specifically, on research by [[Franz-Josef Och]] who won the [[DARPA]] contest for speed machine translation in 2003.@@@@1@30@@danf@17-8-2009 10330120@unknown@formal@none@1@S@Och is now the head of Google's machine translation department.@@@@1@10@@danf@17-8-2009 10330130@unknown@formal@none@1@S@According to Och, a solid base for developing a usable statistical machine translation system for a new pair of languages from scratch, would consist in having a bilingual [[text corpus]] (or [[parallel text|parallel collection]]) of more than a million words and two monolingual corpora of each more than a billion words.@@@@1@51@@danf@17-8-2009 10330140@unknown@formal@none@1@S@Statistical [[Mathematical model|models]] from this data are then used to translate between those languages.@@@@1@14@@danf@17-8-2009 10330150@unknown@formal@none@1@S@To acquire this huge amount of linguistic data, Google used [[United Nations]] documents.@@@@1@13@@danf@17-8-2009 10330160@unknown@formal@none@1@S@The same document is normally available in all six official UN languages, thus Google now has a hectalingual corpus of 20 billion words' worth of human translations.@@@@1@27@@danf@17-8-2009 10330170@unknown@formal@none@1@S@The availability of Arabic and Chinese as official UN languages is probably one of the reasons why Google Translate initially focused on the development of translation between English and those languages, and not, for example, [[Japanese language|Japanese]] and [[German language|German]], which are not official languages at the UN.@@@@1@48@@danf@17-8-2009 10330180@unknown@formal@none@1@S@Google representatives have been very active at domestic conferences in Japan in the field asking researchers to provide them with bilingual corpora.@@@@1@22@@danf@17-8-2009 10330190@unknown@formal@none@1@S@== Options ==@@@@1@3@@danf@17-8-2009 10330200@unknown@formal@none@1@S@(by chronological order)@@@@1@3@@danf@17-8-2009 10330210@unknown@formal@none@1@S@*Beginning@@@@1@1@@danf@17-8-2009 10330220@unknown@formal@none@1@S@**English to Arabic@@@@1@3@@danf@17-8-2009 10330230@unknown@formal@none@1@S@**English to French@@@@1@3@@danf@17-8-2009 10330240@unknown@formal@none@1@S@**English to German@@@@1@3@@danf@17-8-2009 10330250@unknown@formal@none@1@S@**English to Spanish@@@@1@3@@danf@17-8-2009 10330260@unknown@formal@none@1@S@**French to English@@@@1@3@@danf@17-8-2009 10330270@unknown@formal@none@1@S@**German to English@@@@1@3@@danf@17-8-2009 10330280@unknown@formal@none@1@S@**Spanish to English@@@@1@3@@danf@17-8-2009 10330290@unknown@formal@none@1@S@**Arabic to English@@@@1@3@@danf@17-8-2009 10330300@unknown@formal@none@1@S@*2nd stage@@@@1@2@@danf@17-8-2009 10330310@unknown@formal@none@1@S@**English to Portuguese@@@@1@3@@danf@17-8-2009 10330320@unknown@formal@none@1@S@**Portuguese to English@@@@1@3@@danf@17-8-2009 10330330@unknown@formal@none@1@S@*3rd stage@@@@1@2@@danf@17-8-2009 10330340@unknown@formal@none@1@S@**English to Italian@@@@1@3@@danf@17-8-2009 10330350@unknown@formal@none@1@S@**Italian to English@@@@1@3@@danf@17-8-2009 10330360@unknown@formal@none@1@S@*4th stage@@@@1@2@@danf@17-8-2009 10330370@unknown@formal@none@1@S@**English to Chinese (Simplified) BETA@@@@1@5@@danf@17-8-2009 10330380@unknown@formal@none@1@S@**English to Japanese BETA@@@@1@4@@danf@17-8-2009 10330390@unknown@formal@none@1@S@**English to Korean BETA@@@@1@4@@danf@17-8-2009 10330400@unknown@formal@none@1@S@**Chinese (Simplified) to English BETA@@@@1@5@@danf@17-8-2009 10330410@unknown@formal@none@1@S@**Japanese to English BETA@@@@1@4@@danf@17-8-2009 10330420@unknown@formal@none@1@S@**Korean to English BETA@@@@1@4@@danf@17-8-2009 10330430@unknown@formal@none@1@S@*5th stage@@@@1@2@@danf@17-8-2009 10330440@unknown@formal@none@1@S@**English to Russian BETA@@@@1@4@@danf@17-8-2009 10330450@unknown@formal@none@1@S@**Russian to English BETA@@@@1@4@@danf@17-8-2009 10330460@unknown@formal@none@1@S@*6th stage@@@@1@2@@danf@17-8-2009 10330470@unknown@formal@none@1@S@**English to Arabic BETA@@@@1@4@@danf@17-8-2009 10330480@unknown@formal@none@1@S@**Arabic to English BETA@@@@1@4@@danf@17-8-2009 10330490@unknown@formal@none@1@S@*7th stage (launched February, 2007)@@@@1@5@@danf@17-8-2009 10330500@unknown@formal@none@1@S@**English to Chinese (Traditional) BETA@@@@1@5@@danf@17-8-2009 10330510@unknown@formal@none@1@S@**Chinese (Traditional) to English BETA@@@@1@5@@danf@17-8-2009 10330520@unknown@formal@none@1@S@**Chinese (Simplified to Traditional) BETA@@@@1@5@@danf@17-8-2009 10330530@unknown@formal@none@1@S@**Chinese (Traditional to Simplified) BETA@@@@1@5@@danf@17-8-2009 10330540@unknown@formal@none@1@S@*8th stage (launched October, 2007)@@@@1@5@@danf@17-8-2009 10330550@unknown@formal@none@1@S@** all 25 language pairs use Google's machine translation system@@@@1@10@@danf@17-8-2009 10330560@unknown@formal@none@1@S@*9th stage@@@@1@2@@danf@17-8-2009 10330570@unknown@formal@none@1@S@**English to Hindi BETA@@@@1@4@@danf@17-8-2009 10330580@unknown@formal@none@1@S@**Hindi to English BETA@@@@1@4@@danf@17-8-2009 10330590@unknown@formal@none@1@S@*10th stage (as of this stage, translation can be done between any two languages)@@@@1@14@@danf@17-8-2009 10330600@unknown@formal@none@1@S@**Bulgarian@@@@1@1@@danf@17-8-2009 10330610@unknown@formal@none@1@S@**Croatian@@@@1@1@@danf@17-8-2009 10330620@unknown@formal@none@1@S@**Czech@@@@1@1@@danf@17-8-2009 10330630@unknown@formal@none@1@S@**Danish@@@@1@1@@danf@17-8-2009 10330640@unknown@formal@none@1@S@**Dutch@@@@1@1@@danf@17-8-2009 10330650@unknown@formal@none@1@S@**Finnish@@@@1@1@@danf@17-8-2009 10330660@unknown@formal@none@1@S@**Greek@@@@1@1@@danf@17-8-2009 10330670@unknown@formal@none@1@S@**Norwegian@@@@1@1@@danf@17-8-2009 10330680@unknown@formal@none@1@S@**Polish@@@@1@1@@danf@17-8-2009 10330690@unknown@formal@none@1@S@**Romanian@@@@1@1@@danf@17-8-2009 10330700@unknown@formal@none@1@S@**Swedish@@@@1@1@@danf@17-8-2009 10340010@unknown@formal@none@1@S@
Grammar
@@@@1@1@@danf@17-8-2009 10340020@unknown@formal@none@1@S@'''Grammar''' is the field of [[linguistics]] that covers the [[rules]] governing the use of any given [[natural language|natural language]].@@@@1@19@@danf@17-8-2009 10340030@unknown@formal@none@1@S@It includes [[morphology (linguistics)|morphology]] and [[syntax]], often complemented by [[phonetics]], [[phonology]], [[semantics]], and [[pragmatics]].@@@@1@14@@danf@17-8-2009 10340040@unknown@formal@none@1@S@Each language has its own distinct grammar.@@@@1@7@@danf@17-8-2009 10340050@unknown@formal@none@1@S@"English grammar" is the rules of the English language itself.@@@@1@10@@danf@17-8-2009 10340060@unknown@formal@none@1@S@"''An'' English grammar" is a specific study or analysis of these rules.@@@@1@12@@danf@17-8-2009 10340070@unknown@formal@none@1@S@A [[reference book]] describing the grammar of a language is called a "reference grammar" or simply "a grammar".@@@@1@18@@danf@17-8-2009 10340080@unknown@formal@none@1@S@A fully explicit grammar exhaustively describing the [[grammaticality|grammatical]] constructions of a language is called a descriptive grammar, as opposed to [[linguistic prescription]] which tries to enforce the governing rules how a language is to be used.@@@@1@36@@danf@17-8-2009 10340090@unknown@formal@none@1@S@[[Grammatical framework]]s are approaches to constructing grammars.@@@@1@7@@danf@17-8-2009 10340100@unknown@formal@none@1@S@The standard framework of [[generative grammar]] is the [[transformational grammar]] model developed by [[Noam Chomsky]] and his followers from the 1950s to 1980s.@@@@1@23@@danf@17-8-2009 10340110@unknown@formal@none@1@S@==Etymology==@@@@1@1@@danf@17-8-2009 10340120@unknown@formal@none@1@S@The word "grammar," derives from [[Greek language|Greek]] ''γραμματική τέχνη'' (''grammatike techne''), which means "art of letters," from ''γράμμα'' (''gramma''), "letter," and that from ''γράφειν'' (''graphein''), "to draw, to write".@@@@1@29@@danf@17-8-2009 10340130@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10340140@unknown@formal@none@1@S@The first systematic grammars originate in [[Iron Age India]], with [[Panini (grammarian)|Panini]] (4th c. BC) and his commentators [[Pingala]] (ca. 200 BC), [[Katyayana]], and [[Patanjali]] (2nd c. BC).@@@@1@28@@danf@17-8-2009 10340150@unknown@formal@none@1@S@In the West, grammar emerges as a discipline in [[Hellenism]] from the 3rd c. BC forward with authors like [[Rhyanus]] and [[Aristarchus of Samothrace]], the oldest extant work being the ''[[Art of Grammar]]'' ({{lang|grc|Τέχνη Γραμματική}}), attributed to [[Dionysius Thrax]] (ca. 100 BC).@@@@1@42@@danf@17-8-2009 10340160@unknown@formal@none@1@S@[[Latin grammar]] developed by following Greek models from the 1st century BC, due to the work of authors such as [[Orbilius Pupillus]], [[Remmius Palaemon]], [[Marcus Valerius Probus]], [[Verrius Flaccus]], [[Aemilius Asper]].@@@@1@31@@danf@17-8-2009 10340170@unknown@formal@none@1@S@Tamil grammatical tradition also began around the 1st century BC with the [[Tolkāppiyam]].@@@@1@13@@danf@17-8-2009 10340180@unknown@formal@none@1@S@A grammar of [[Old Irish|Irish]] originated in the 7th century with the [[Auraicept na n-Éces]].@@@@1@15@@danf@17-8-2009 10340190@unknown@formal@none@1@S@[[Arabic grammar]] emerges from the 8th century with the work of [[Ibn Abi Ishaq]] and his students.@@@@1@17@@danf@17-8-2009 10340200@unknown@formal@none@1@S@The first treatises on [[Hebrew grammar]] appear in the [[High Middle Ages]], in the context of [[Mishnah]] (exegesis of the [[Hebrew Bible]]).@@@@1@22@@danf@17-8-2009 10340210@unknown@formal@none@1@S@The [[Karaite]] tradition originates in [[Abbasid]] [[Baghdad]].@@@@1@7@@danf@17-8-2009 10340220@unknown@formal@none@1@S@The ''[[Diqduq]]'' (10th century) is one of the earliest grammatical commentaries on the Hebrew Bible.@@@@1@15@@danf@17-8-2009 10340230@unknown@formal@none@1@S@[[Ibn Barun]] in the 12th century compares the Hebrew language with [[Arabic language|Arabic]] in the [[Islamic grammatical tradition]].@@@@1@18@@danf@17-8-2009 10340240@unknown@formal@none@1@S@Belonging to the ''trivium'' of the seven [[liberal arts]], grammar was taught as a core discipline throughout the [[Middle Ages]], following the influence of authors from [[Late Antiquity]], such as [[Priscian]].@@@@1@31@@danf@17-8-2009 10340250@unknown@formal@none@1@S@Treatment of vernaculars begins gradually during the [[High Middle Ages]], with isolated works such as the [[First Grammatical Treatise]], but becomes influential only in the [[Renaissance]] and [[Baroque]] periods.@@@@1@29@@danf@17-8-2009 10340260@unknown@formal@none@1@S@In [[1486]], [[Antonio de Nebrija]] published ''Las introduciones Latinas contrapuesto el romance al Latin'', and the first [[Spanish grammar]], ''Gramática de la lengua castellana'', in 1492.@@@@1@26@@danf@17-8-2009 10340270@unknown@formal@none@1@S@During the 16th century [[Italian Renaissance]], the ''Questione della lingua'' was the discussion on the status and ideal form of the [[Italian language]], initiated by [[Dante]]'s ''[[de vulgari eloquentia]]'' ([[Pietro Bembo]], ''Prose della volgar lingua'' Venice 1525).@@@@1@37@@danf@17-8-2009 10340280@unknown@formal@none@1@S@Grammars of non-European languages began to be compiled for the purposes of [[evangelization]] and [[Bible translation]] from the 16th century onward, such as ''Grammatica o Arte de la Lengua General de los Indios de los Reynos del Perú'' (1560), and a [[Quechua]] grammar by [[Fray Domingo de Santo Tomás]].@@@@1@49@@danf@17-8-2009 10340290@unknown@formal@none@1@S@In 1643 there appeared [[Ivan Uzhevych]]'s ''Grammatica sclavonica'' and, in 1762, the ''Short Introduction to English Grammar'' of [[Robert Lowth]] was also published.@@@@1@23@@danf@17-8-2009 10340300@unknown@formal@none@1@S@The ''Grammatisch-Kritisches Wörterbuch der hochdeutschen Mundart'', a [[High German]] grammar in five volumes by [[Johann Christoph Adelung]], appeared as early as 1774.@@@@1@22@@danf@17-8-2009 10340310@unknown@formal@none@1@S@From the latter part of the 18th century, grammar came to be understood as a subfield of the emerging discipline of modern [[linguistics]].@@@@1@23@@danf@17-8-2009 10340320@unknown@formal@none@1@S@The Serbian grammar by [[Vuk Stefanović Karadžić]] arrived in 1814, while the ''Deutsche Grammatik'' of the [[Brothers Grimm]] was first published in 1818.@@@@1@23@@danf@17-8-2009 10340330@unknown@formal@none@1@S@The ''Comparative Grammar'' of [[Franz Bopp]], the starting point of modern [[comparative linguistics]], came out in 1833.@@@@1@17@@danf@17-8-2009 10340340@unknown@formal@none@1@S@In the [[USA]], the Society for the Promotion of Good Grammar has designated March 4, 2008 as National Grammar Day.@@@@1@20@@danf@17-8-2009 10340350@unknown@formal@none@1@S@==Development of grammars==@@@@1@3@@danf@17-8-2009 10340360@unknown@formal@none@1@S@Grammars evolve through usage, and grammars also develop due to separations of the human population.@@@@1@15@@danf@17-8-2009 10340370@unknown@formal@none@1@S@With the advent of written [[Knowledge representation|representation]]s, formal rules about language usage tend to appear also.@@@@1@16@@danf@17-8-2009 10340380@unknown@formal@none@1@S@Formal grammars are [[codification (linguistics)|codifications]] of usage that are developed by repeated documentation over time, and by [[observation]] as well.@@@@1@20@@danf@17-8-2009 10340390@unknown@formal@none@1@S@As the rules become established and developed, the prescriptive concept of grammatical correctness can arise.@@@@1@15@@danf@17-8-2009 10340400@unknown@formal@none@1@S@This often creates a discrepancy between contemporary usage and that which has been accepted over time as being correct.@@@@1@19@@danf@17-8-2009 10340410@unknown@formal@none@1@S@Linguists tend to believe that prescriptive grammars do not have any justification beyond their authors' aesthetic tastes; however, prescriptions are considered in [[sociolinguistics]] as part of the explanation for why some people say "I didn't do nothing", some say "I didn't do anything", and some say one or the other depending on social context.@@@@1@54@@danf@17-8-2009 10340420@unknown@formal@none@1@S@The formal study of grammar is an important part of [[education]] for children from a young age through advanced [[learning]], though the rules taught in schools are not a "grammar" in the sense most [[linguistics|linguists]] use the term, as they are often [[prescriptive]] rather than [[descriptive]].@@@@1@46@@danf@17-8-2009 10340430@unknown@formal@none@1@S@[[Constructed language]]s (also called planned languages or conlangs) are more common in the modern day.@@@@1@15@@danf@17-8-2009 10340440@unknown@formal@none@1@S@Many have been designed to aid human [[communication]] (for example, naturalistic [[Interlingua]], schematic [[Esperanto]], and the highly logic-compatible artificial language [[Lojban]]).@@@@1@21@@danf@17-8-2009 10340450@unknown@formal@none@1@S@Each of these languages has its own grammar.@@@@1@8@@danf@17-8-2009 10340460@unknown@formal@none@1@S@No clear line can be drawn between syntax and morphology.@@@@1@10@@danf@17-8-2009 10340470@unknown@formal@none@1@S@[[Analytic languages]] use [[syntax]] to convey information that is encoded via [[inflection]] in [[synthetic language]]s.@@@@1@15@@danf@17-8-2009 10340480@unknown@formal@none@1@S@In other words, word order is not significant and [[morphology (linguistics)|morphology]] is highly significant in a purely synthetic language, whereas morphology is not significant and syntax is highly significant in an analytic language.@@@@1@33@@danf@17-8-2009 10340490@unknown@formal@none@1@S@[[Chinese language|Chinese]] and [[Afrikaans language|Afrikaans]], for example, are highly analytic, and meaning is therefore very context – dependent.@@@@1@18@@danf@17-8-2009 10340500@unknown@formal@none@1@S@(Both do have some inflections, and have had more in the past; thus, they are becoming even less synthetic and more "purely" analytic over time.)@@@@1@25@@danf@17-8-2009 10340510@unknown@formal@none@1@S@[[Latin]], which is highly [[synthetic language|synthetic]], uses [[affix]]es and [[inflection]]s to convey the same information that Chinese does with [[syntax]].@@@@1@20@@danf@17-8-2009 10340520@unknown@formal@none@1@S@Because Latin words are quite (though not completely) self-contained, an intelligible Latin [[Sentence (linguistics)|sentence]] can be made from elements that are placed in a largely arbitrary order.@@@@1@27@@danf@17-8-2009 10340530@unknown@formal@none@1@S@Latin has a complex affixation and a simple syntax, while Chinese has the opposite.@@@@1@14@@danf@17-8-2009 10340540@unknown@formal@none@1@S@==Grammar frameworks==@@@@1@2@@danf@17-8-2009 10340550@unknown@formal@none@1@S@Various "grammar frameworks" have been developed in [[theoretical linguistics]] since the mid 20th century, in particular under the influence of the idea of a "[[Universal grammar]]" in the USA.@@@@1@29@@danf@17-8-2009 10340560@unknown@formal@none@1@S@Of these, the main divisions are:@@@@1@6@@danf@17-8-2009 10340570@unknown@formal@none@1@S@*[[Transformational grammar]] (TG))@@@@1@3@@danf@17-8-2009 10340580@unknown@formal@none@1@S@*[[Principles and Parameters|Principles and Parameters Theory]] (P&P)@@@@1@7@@danf@17-8-2009 10340590@unknown@formal@none@1@S@*[[Lexical functional grammar|Lexical-functional Grammar]] (LFG)@@@@1@5@@danf@17-8-2009 10340600@unknown@formal@none@1@S@*[[Generalised Phrase Structure Grammar|Generalized Phrase Structure Grammar]] (GPSG)@@@@1@8@@danf@17-8-2009 10340610@unknown@formal@none@1@S@*[[Head-Driven Phrase Structure Grammar]] (HPSG)@@@@1@5@@danf@17-8-2009 10340620@unknown@formal@none@1@S@*[[Dependency grammar]]s (DG)@@@@1@3@@danf@17-8-2009 10340630@unknown@formal@none@1@S@*[[Role and reference grammar]] (RRG)@@@@1@5@@danf@17-8-2009 10350010@unknown@formal@none@1@S@
Hidden Markov model
@@@@1@3@@danf@17-8-2009 10350020@unknown@formal@none@1@S@A '''hidden Markov model''' ('''HMM''') is a [[statistical model]] in which the system being modeled is assumed to be a [[Markov process]] with unknown parameters, and the challenge is to determine the hidden parameters from the [[observable]] parameters.@@@@1@38@@danf@17-8-2009 10350030@unknown@formal@none@1@S@The extracted model parameters can then be used to perform further analysis, for example for [[pattern recognition]] applications.@@@@1@18@@danf@17-8-2009 10350040@unknown@formal@none@1@S@An HMM can be considered as the simplest [[dynamic Bayesian network]].@@@@1@11@@danf@17-8-2009 10350050@unknown@formal@none@1@S@In a regular [[Markov model]], the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters.@@@@1@23@@danf@17-8-2009 10350060@unknown@formal@none@1@S@In a ''hidden'' Markov model, the state is not directly visible, but variables influenced by the state are visible.@@@@1@19@@danf@17-8-2009 10350070@unknown@formal@none@1@S@Each state has a probability distribution over the possible output tokens.@@@@1@11@@danf@17-8-2009 10350080@unknown@formal@none@1@S@Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.@@@@1@17@@danf@17-8-2009 10350090@unknown@formal@none@1@S@Hidden Markov models are especially known for their application in [[time| temporal]] pattern recognition such as [[speech recognition|speech]], [[handwriting recognition|handwriting]], [[gesture recognition]], [[musical score]] following, [[partial discharge]]s and [[bioinformatics]].@@@@1@29@@danf@17-8-2009 10350100@unknown@formal@none@1@S@== Architecture of a hidden Markov model ==@@@@1@8@@danf@17-8-2009 10350110@unknown@formal@none@1@S@The diagram below shows the general architecture of an instantiated HMM.@@@@1@11@@danf@17-8-2009 10350120@unknown@formal@none@1@S@Each oval shape represents a random variable that can adopt a number of values.@@@@1@14@@danf@17-8-2009 10350130@unknown@formal@none@1@S@The random variable x(t) is the hidden state at time t (with the model from the above diagram, x(t) \\in \\{x_1, x_2, x_3\\}).@@@@1@23@@danf@17-8-2009 10350140@unknown@formal@none@1@S@The random variable y(t) is the observation at time t (y(t) \\in \\{y_1, y_2, y_3, y_4\\}).@@@@1@16@@danf@17-8-2009 10350150@unknown@formal@none@1@S@The arrows in the diagram (often called a [[Trellis (graph)|trellis diagram]]) denote conditional dependencies.@@@@1@14@@danf@17-8-2009 10350160@unknown@formal@none@1@S@From the diagram, it is clear that the value of the hidden variable x(t) (at time t) ''only'' depends on the value of the hidden variable x(t-1) : the values at time t-2 and before have no influence.@@@@1@38@@danf@17-8-2009 10350170@unknown@formal@none@1@S@This is called the [[Markov property]].@@@@1@6@@danf@17-8-2009 10350180@unknown@formal@none@1@S@Similarly, the value of the observed variable y(t) only depends on the value of the hidden variable x(t) (both at time t).@@@@1@22@@danf@17-8-2009 10350190@unknown@formal@none@1@S@==Probability of an observed sequence==@@@@1@5@@danf@17-8-2009 10350200@unknown@formal@none@1@S@The probability of observing a sequence Y=y(0), y(1),\\dots,y(L-1) of length L is given by@@@@1@14@@danf@17-8-2009 10350210@unknown@formal@none@1@S@:P(Y)=\\sum_{X}P(Y\\mid X)P(X),@@@@1@2@@danf@17-8-2009 10350220@unknown@formal@none@1@S@where the sum runs over all possible hidden node sequences X=x(0), x(1), \\dots, x(L-1).@@@@1@14@@danf@17-8-2009 10350230@unknown@formal@none@1@S@Brute force calculation of P(Y) is intractable for most real-life problems, as the number of possible hidden node sequences is typically extremely high.@@@@1@23@@danf@17-8-2009 10350240@unknown@formal@none@1@S@The calculation can however be sped up enormously using the [[Viterbi algorithm|forward algorithm]] or the equivalent backward algorithm.@@@@1@18@@danf@17-8-2009 10350250@unknown@formal@none@1@S@==Using hidden Markov models==@@@@1@4@@danf@17-8-2009 10350260@unknown@formal@none@1@S@There are three [[canonical]] problems associated with HMM:@@@@1@8@@danf@17-8-2009 10350270@unknown@formal@none@1@S@* Given the parameters of the model, compute the probability of a particular output sequence, and the probabilities of the hidden state values given that output sequence.@@@@1@27@@danf@17-8-2009 10350280@unknown@formal@none@1@S@This problem is solved by the [[forward-backward algorithm]].@@@@1@8@@danf@17-8-2009 10350290@unknown@formal@none@1@S@* Given the parameters of the model, find the most likely sequence of hidden states that could have generated a given output sequence.@@@@1@23@@danf@17-8-2009 10350300@unknown@formal@none@1@S@This problem is solved by the [[Viterbi algorithm]].@@@@1@8@@danf@17-8-2009 10350310@unknown@formal@none@1@S@* Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities.@@@@1@22@@danf@17-8-2009 10350320@unknown@formal@none@1@S@In other words, discover the parameters of the HMM given a dataset of sequences.@@@@1@14@@danf@17-8-2009 10350330@unknown@formal@none@1@S@This problem is solved by the [[Baum-Welch algorithm]].@@@@1@8@@danf@17-8-2009 10350340@unknown@formal@none@1@S@=== A concrete example ===@@@@1@5@@danf@17-8-2009 10350350@unknown@formal@none@1@S@''This example is further elaborated in the [[Viterbi algorithm]] page.''@@@@1@10@@danf@17-8-2009 10350360@unknown@formal@none@1@S@===Applications of hidden Markov models===@@@@1@5@@danf@17-8-2009 10350370@unknown@formal@none@1@S@* [[Cryptanalysis]]@@@@1@2@@danf@17-8-2009 10350380@unknown@formal@none@1@S@* [[Speech recognition]]@@@@1@3@@danf@17-8-2009 10350390@unknown@formal@none@1@S@* [[Machine translation]]@@@@1@3@@danf@17-8-2009 10350400@unknown@formal@none@1@S@* [[Partial discharge]]@@@@1@3@@danf@17-8-2009 10350410@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10350420@unknown@formal@none@1@S@Hidden Markov Models were first described in a series of statistical papers by [[Leonard E. Baum]] and other authors in the second half of the 1960s.@@@@1@26@@danf@17-8-2009 10350430@unknown@formal@none@1@S@One of the first applications of HMMs was [[speech recognition]], starting in the mid-1970s.@@@@1@14@@danf@17-8-2009 10350440@unknown@formal@none@1@S@In the second half of the 1980s, HMMs began to be applied to the analysis of biological sequences, in particular [[DNA]].@@@@1@21@@danf@17-8-2009 10350450@unknown@formal@none@1@S@Since then, they have become ubiquitous in the field of [[bioinformatics]].@@@@1@11@@danf@17-8-2009 10360010@unknown@formal@none@1@S@
HTML
@@@@1@1@@danf@17-8-2009 10360020@unknown@formal@none@1@S@'''HTML''', an [[Acronym and initialism|initialism]] of '''HyperText Markup Language''', is the predominant [[markup language]] for [[web page]]s.@@@@1@17@@danf@17-8-2009 10360030@unknown@formal@none@1@S@It provides a means to describe the structure of text-based information in a document — by denoting certain text as links, headings, paragraphs, lists, and so on — and to supplement that text with ''interactive forms'', embedded ''images'', and other objects.@@@@1@41@@danf@17-8-2009 10360040@unknown@formal@none@1@S@HTML is written in the form of tags, surrounded by [[Brackets#Angle brackets or chevrons .3C .3E|angle brackets]].@@@@1@17@@danf@17-8-2009 10360050@unknown@formal@none@1@S@HTML can also describe, to some degree, the appearance and [[semantics]] of a document, and can include embedded [[scripting language]] code (such as JavaScript) which can affect the behavior of [[Web browser]]s and other HTML processors.@@@@1@36@@danf@17-8-2009 10360060@unknown@formal@none@1@S@HTML is also often used to refer to content in specific languages, such as a [[MIME type]] text/html, or even more broadly as a generic term for HTML, whether in its [[XML]]-descended form (such as [[XHTML]] 1.0 and later) or its form descended directly from [[SGML]] (such as HTML 4.01 and earlier).@@@@1@52@@danf@17-8-2009 10360070@unknown@formal@none@1@S@By convention, HTML format data files use a file extension .html or .htm.@@@@1@13@@danf@17-8-2009 10360080@unknown@formal@none@1@S@==History of HTML==@@@@1@3@@danf@17-8-2009 10360090@unknown@formal@none@1@S@===Origins===@@@@1@1@@danf@17-8-2009 10360100@unknown@formal@none@1@S@In 1980, physicist [[Tim Berners-Lee]], who was an independent contractor at [[CERN]], proposed and prototyped [[ENQUIRE]], a system for CERN researchers to use and share documents.@@@@1@26@@danf@17-8-2009 10360110@unknown@formal@none@1@S@In 1989, Berners-Lee and CERN data systems engineer [[Robert Cailliau]] each submitted separate proposals for an [[Internet]]-based [[hypertext]] system providing similar functionality.@@@@1@22@@danf@17-8-2009 10360120@unknown@formal@none@1@S@The following year, they collaborated on a joint proposal, the WorldWideWeb (W3) project, which was accepted by CERN.@@@@1@18@@danf@17-8-2009 10360130@unknown@formal@none@1@S@===First specifications===@@@@1@2@@danf@17-8-2009 10360140@unknown@formal@none@1@S@The first publicly available description of HTML was a document called ''HTML Tags'', first mentioned on the Internet by Berners-Lee in late 1991.@@@@1@23@@danf@17-8-2009 10360150@unknown@formal@none@1@S@It describes 22 elements comprising the initial, relatively simple design of HTML.@@@@1@12@@danf@17-8-2009 10360160@unknown@formal@none@1@S@Thirteen of these elements still exist in HTML 4.@@@@1@9@@danf@17-8-2009 10360170@unknown@formal@none@1@S@Berners-Lee considered HTML to be, at the time, an application of [[SGML]], but it was not formally defined as such until the mid-1993 publication, by the [[Internet Engineering Task Force|IETF]], of the first proposal for an HTML specification: Berners-Lee and [[Dan Connolly]]'s "Hypertext Markup Language (HTML)" Internet-Draft, which included an SGML [[Document Type Definition]] to define the grammar.@@@@1@58@@danf@17-8-2009 10360180@unknown@formal@none@1@S@The draft expired after six months, but was notable for its acknowledgment of the [[Mosaic (web browser)|NCSA Mosaic]] browser's custom tag for embedding in-line images, reflecting the IETF's philosophy of basing standards on successful prototypes.@@@@1@35@@danf@17-8-2009 10360190@unknown@formal@none@1@S@Similarly, Dave Raggett's competing Internet-Draft, "HTML+ (Hypertext Markup Format)", from late 1993, suggested standardizing already-implemented features like tables and fill-out forms.@@@@1@21@@danf@17-8-2009 10360200@unknown@formal@none@1@S@After the HTML and HTML+ drafts expired in early 1994, the IETF created an HTML Working Group, which in 1995 completed "HTML 2.0", the first HTML specification intended to be treated as a standard against which future implementations should be based.@@@@1@41@@danf@17-8-2009 10360210@unknown@formal@none@1@S@Published as [[Request for Comments]] 1996, HTML 2.0 included ideas from the HTML and HTML+ drafts.@@@@1@16@@danf@17-8-2009 10360220@unknown@formal@none@1@S@There was no "HTML 1.0"; the 2.0 designation was intended to distinguish the new edition from previous drafts.@@@@1@18@@danf@17-8-2009 10360230@unknown@formal@none@1@S@Further development under the auspices of the IETF was stalled by competing interests.@@@@1@13@@danf@17-8-2009 10360240@unknown@formal@none@1@S@Since 1996, the HTML specifications have been maintained, with input from commercial software vendors, by the [[World Wide Web Consortium]] (W3C).@@@@1@21@@danf@17-8-2009 10360250@unknown@formal@none@1@S@However, in 2000, HTML also became an international standard ([[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] 15445:2000).@@@@1@16@@danf@17-8-2009 10360260@unknown@formal@none@1@S@The last HTML specification published by the W3C is the HTML 4.01 Recommendation, published in late 1999.@@@@1@17@@danf@17-8-2009 10360270@unknown@formal@none@1@S@Its issues and errors were last acknowledged by errata published in 2001.@@@@1@12@@danf@17-8-2009 10360280@unknown@formal@none@1@S@===Version history of the standard===@@@@1@5@@danf@17-8-2009 10360290@unknown@formal@none@1@S@====HTML versions====@@@@1@2@@danf@17-8-2009 10360300@unknown@formal@none@1@S@'''July, 1993:''' [http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt Hypertext Markup Language], was published at [[Internet Engineering Task Force|IETF]] working draft (that is, not yet a standard).@@@@1@21@@danf@17-8-2009 10360310@unknown@formal@none@1@S@'''November, 1995:''' [http://tools.ietf.org/html/rfc1866 HTML 2.0] published as IETF [[Request for Comments]]:@@@@1@11@@danf@17-8-2009 10360320@unknown@formal@none@1@S@* RFC 1866,@@@@1@3@@danf@17-8-2009 10360330@unknown@formal@none@1@S@* supplemented by RFC 1867 (form-based file upload) that same month,@@@@1@11@@danf@17-8-2009 10360340@unknown@formal@none@1@S@* RFC 1942 (tables) in ''May 1996'',@@@@1@7@@danf@17-8-2009 10360350@unknown@formal@none@1@S@* RFC 1980 (client-side image maps) in ''August 1996'', and@@@@1@10@@danf@17-8-2009 10360360@unknown@formal@none@1@S@* RFC 2070 ([[internationalization and localization|internationalization]]) in ''January 1997'';@@@@1@9@@danf@17-8-2009 10360370@unknown@formal@none@1@S@Ultimately, all were declared obsolete/historic by RFC 2854 in ''June 2000''.@@@@1@11@@danf@17-8-2009 10360380@unknown@formal@none@1@S@'''April 1995''': [http://www.w3.org/MarkUp/html3/ HTML 3.0], proposed as a standard to the IETF.@@@@1@12@@danf@17-8-2009 10360390@unknown@formal@none@1@S@It included many of the capabilities that were in Raggett's HTML+ proposal, such as support for tables, text flow around figures, and the display of complex mathematical formulas.@@@@1@28@@danf@17-8-2009 10360400@unknown@formal@none@1@S@A demonstration appeared in W3C's own [[Arena (web browser)|Arena browser]].@@@@1@10@@danf@17-8-2009 10360410@unknown@formal@none@1@S@HTML 3.0 did not succeed for several reasons.@@@@1@8@@danf@17-8-2009 10360420@unknown@formal@none@1@S@The pace of browser development, as well as the number of interested parties, had outstripped the resources of the IETF.@@@@1@20@@danf@17-8-2009 10360430@unknown@formal@none@1@S@Netscape continued to introduce HTML elements that specified the visual appearance of documents, contrary to the goals of the newly-formed W3C, which sought to limit HTML to describing logical structure.@@@@1@30@@danf@17-8-2009 10360440@unknown@formal@none@1@S@Microsoft, a newcomer at the time, played to all sides by creating its own tags, implementing Netscape's elements for compatibility, and supporting W3C features such as Cascading Style Sheets.@@@@1@29@@danf@17-8-2009 10360450@unknown@formal@none@1@S@'''[[January 14]], [[1997]]:''' [http://www.w3.org/TR/REC-html32 HTML 3.2], published as a [[W3C Recommendation]].@@@@1@11@@danf@17-8-2009 10360460@unknown@formal@none@1@S@It was the first version developed and standardized exclusively by the W3C, as the IETF had closed its HTML Working Group in September 1997.@@@@1@24@@danf@17-8-2009 10360470@unknown@formal@none@1@S@The new version dropped math formulas entirely, reconciled overlap among various proprietary extensions, and adopted most of Netscape's visual markup tags.@@@@1@21@@danf@17-8-2009 10360480@unknown@formal@none@1@S@Netscape's [[blink element]] and Microsoft's [[marquee element]] were omitted due to a mutual agreement between the two companies.@@@@1@18@@danf@17-8-2009 10360490@unknown@formal@none@1@S@The ability to include mathematical formulas in HTML would not be standardized until years later in [[MathML]].@@@@1@17@@danf@17-8-2009 10360500@unknown@formal@none@1@S@'''[[December 18]], [[1997]]:''' [http://www.w3.org/TR/REC-html40-971218/ HTML 4.0], published as a W3C Recommendation.@@@@1@11@@danf@17-8-2009 10360510@unknown@formal@none@1@S@It offers three "flavors":@@@@1@4@@danf@17-8-2009 10360520@unknown@formal@none@1@S@* Strict, in which deprecated elements are forbidden,@@@@1@8@@danf@17-8-2009 10360530@unknown@formal@none@1@S@* Transitional, in which deprecated elements are allowed,@@@@1@8@@danf@17-8-2009 10360540@unknown@formal@none@1@S@* Frameset, in which mostly only [[Framing (World Wide Web)|frame]] related elements are allowed;@@@@1@14@@danf@17-8-2009 10360550@unknown@formal@none@1@S@HTML 4.0 (initially code-named "Cougar") likewise adopted many browser-specific element types and attributes, but at the same time sought to phase out Netscape's visual markup features by marking them as [[deprecation|deprecated]] in favor of style sheets.@@@@1@36@@danf@17-8-2009 10360560@unknown@formal@none@1@S@Minor editorial revisions to the HTML 4.0 specification were published in 1998 without incrementing the version number and further minor revisions as HTML 4.01.@@@@1@24@@danf@17-8-2009 10360570@unknown@formal@none@1@S@'''[[April 24]], [[1998]]:''' [http://www.w3.org/TR/1998/REC-html40-19980424/ HTML 4.0] was reissued with minor edits without incrementing the version number.@@@@1@16@@danf@17-8-2009 10360580@unknown@formal@none@1@S@'''[[December 24]], [[1999]]:''' [http://www.w3.org/TR/html401 HTML 4.01], published as a W3C Recommendation.@@@@1@11@@danf@17-8-2009 10360590@unknown@formal@none@1@S@It offers the same three flavors as HTML 4.0, and its last [http://www.w3.org/MarkUp/html4-updates/errata errata] were published [[May 12]], [[2001]].@@@@1@19@@danf@17-8-2009 10360600@unknown@formal@none@1@S@HTML 4.01 and ISO/IEC 15445:2000 are the most recent and final versions of HTML.@@@@1@14@@danf@17-8-2009 10360610@unknown@formal@none@1@S@'''[[May 15]], [[2000]]:''' [https://www.cs.tcd.ie/15445/15445.HTML ISO/IEC 15445:2000] ("[[International Organization for Standardization|ISO]] HTML", based on HTML 4.01 Strict), published as an ISO/IEC international standard.@@@@1@22@@danf@17-8-2009 10360620@unknown@formal@none@1@S@'''[[January 22]], [[2008]]:''' [http://www.w3.org/TR/html5/ HTML 5], published as a Working Draft by W3C.@@@@1@13@@danf@17-8-2009 10360630@unknown@formal@none@1@S@====XHTML versions====@@@@1@2@@danf@17-8-2009 10360640@unknown@formal@none@1@S@XHTML is a separate language that began as a reformulation of HTML 4.01 using XML 1.0.@@@@1@16@@danf@17-8-2009 10360650@unknown@formal@none@1@S@It continues to be developed:@@@@1@5@@danf@17-8-2009 10360660@unknown@formal@none@1@S@* [http://www.w3.org/TR/xhtml1/ XHTML 1.0], published [[January 26]], [[2000]] as a W3C Recommendation, later revised and republished [[August 1]], [[2002]].@@@@1@19@@danf@17-8-2009 10360670@unknown@formal@none@1@S@It offers the same three flavors as HTML 4.0 and 4.01, reformulated in XML, with minor restrictions.@@@@1@17@@danf@17-8-2009 10360680@unknown@formal@none@1@S@* [http://www.w3.org/TR/xhtml11/ XHTML 1.1], published [[May 31]], [[2001]] as a W3C Recommendation.@@@@1@12@@danf@17-8-2009 10360690@unknown@formal@none@1@S@It is based on XHTML 1.0 Strict, but includes minor changes, can be customized, and is reformulated using modules from [http://www.w3.org/TR/xhtml-modularization Modularization of XHTML], which was published [[April 10]], [[2001]] as a W3C Recommendation.@@@@1@34@@danf@17-8-2009 10360700@unknown@formal@none@1@S@* [http://www.w3.org/TR/xhtml2/ XHTML 2.0] is still a W3C Working Draft.@@@@1@10@@danf@17-8-2009 10360710@unknown@formal@none@1@S@XHTML 2.0 is incompatible with XHTML 1.x and, therefore, would be more accurate to characterize as an XHTML-inspired new language than an update to XHTML 1.x.@@@@1@26@@danf@17-8-2009 10360720@unknown@formal@none@1@S@* XHTML 5, which is an update to XHTML 1.x, is being defined alongside [[HTML 5]] in the [http://www.w3.org/html/wg/html5/ HTML 5 draft].@@@@1@22@@danf@17-8-2009 10360730@unknown@formal@none@1@S@==HTML markup==@@@@1@2@@danf@17-8-2009 10360740@unknown@formal@none@1@S@HTML markup consists of several key components, including ''elements'' (and their ''attributes''), character-based ''data types'', and ''character references'' and ''entity references''.@@@@1@21@@danf@17-8-2009 10360750@unknown@formal@none@1@S@Another important component is the ''document type declaration''.@@@@1@8@@danf@17-8-2009 10360760@unknown@formal@none@1@S@HTML [[Hello world program|Hello World]]: Hello HTML Hello World! @@@@1@18@@danf@17-8-2009 10360770@unknown@formal@none@1@S@===Elements===@@@@1@1@@danf@17-8-2009 10360780@unknown@formal@none@1@S@:''See [[HTML element]]s for more detailed descriptions.''@@@@1@7@@danf@17-8-2009 10360790@unknown@formal@none@1@S@Elements are the basic structure for HTML markup.@@@@1@8@@danf@17-8-2009 10360800@unknown@formal@none@1@S@Elements have two basic properties: attributes and content.@@@@1@8@@danf@17-8-2009 10360810@unknown@formal@none@1@S@Each attribute and each element's content has certain restrictions that must be followed for an HTML document to be considered valid.@@@@1@21@@danf@17-8-2009 10360820@unknown@formal@none@1@S@An element usually has a start tag (e.g. ) and an end tag (e.g. ).@@@@1@15@@danf@17-8-2009 10360830@unknown@formal@none@1@S@The element's attributes are contained in the start tag and content is located between the tags (e.g. Content).@@@@1@18@@danf@17-8-2009 10360840@unknown@formal@none@1@S@Some elements, such as , do not have any content and must not have a closing tag.@@@@1@17@@danf@17-8-2009 10360850@unknown@formal@none@1@S@Listed below are several types of markup elements used in HTML.@@@@1@11@@danf@17-8-2009 10360860@unknown@formal@none@1@S@'''Structural''' markup describes the purpose of text.@@@@1@7@@danf@17-8-2009 10360870@unknown@formal@none@1@S@For example,

Golf

establishes "Golf" as a second-level [[heading]], which would be rendered in a browser in a manner similar to the "HTML markup" title at the start of this section.@@@@1@31@@danf@17-8-2009 10360880@unknown@formal@none@1@S@Structural markup does not denote any specific rendering, but most Web browsers have standardized on how elements should be formatted.@@@@1@20@@danf@17-8-2009 10360890@unknown@formal@none@1@S@Text may be further styled with [[Cascading Style Sheets]] (CSS).@@@@1@10@@danf@17-8-2009 10360900@unknown@formal@none@1@S@'''Presentational''' markup describes the appearance of the text, regardless of its function.@@@@1@12@@danf@17-8-2009 10360910@unknown@formal@none@1@S@For example boldface indicates that visual output devices should render "boldface" in bold text, but gives no indication what devices which are unable to do this (such as aural devices that read the text aloud) should do.@@@@1@37@@danf@17-8-2009 10360920@unknown@formal@none@1@S@In the case of both bold and italic, there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely strong emphasis and emphasis respectively.@@@@1@30@@danf@17-8-2009 10360930@unknown@formal@none@1@S@It is easier to see how an aural user agent should interpret the latter two elements.@@@@1@16@@danf@17-8-2009 10360940@unknown@formal@none@1@S@However, they are not equivalent to their presentational counterparts: it would be undesirable for a screen-reader to emphasize the name of a book, for instance, but on a screen such a name would be italicized.@@@@1@35@@danf@17-8-2009 10360950@unknown@formal@none@1@S@Most presentational markup elements have become [[Deprecation|deprecated]] under the HTML 4.0 specification, in favor of [[Cascading Style Sheets|CSS]] based style design.@@@@1@21@@danf@17-8-2009 10360960@unknown@formal@none@1@S@'''Hypertext''' markup links parts of the document to other documents.@@@@1@10@@danf@17-8-2009 10360970@unknown@formal@none@1@S@HTML up through version [[XHTML]] 1.1 requires the use of an anchor element to create a hyperlink in the flow of text: Wikipedia.@@@@1@23@@danf@17-8-2009 10360980@unknown@formal@none@1@S@However, the href attribute must also be set to a valid [[Uniform Resource Locator|URL]] so for example the HTML code, Wikipedia, will render the word "[http://en.wikipedia.org/ Wikipedia]" as a [[hyperlink]].@@@@1@31@@danf@17-8-2009 10360985@unknown@formal@none@1@S@To link on an image, the anchor tag use the following syntax: @@@@1@16@@danf@17-8-2009 10360990@unknown@formal@none@1@S@===Attributes===@@@@1@1@@danf@17-8-2009 10361000@unknown@formal@none@1@S@Most of the attributes of an element are name-value pairs, separated by "=", and written within the start tag of an element, after the element's name.@@@@1@26@@danf@17-8-2009 10361010@unknown@formal@none@1@S@The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML).@@@@1@25@@danf@17-8-2009 10361020@unknown@formal@none@1@S@Leaving attribute values unquoted is considered unsafe.@@@@1@7@@danf@17-8-2009 10361030@unknown@formal@none@1@S@In contrast with name-value pair attributes, there are some attributes that affect the element simply by their presence in the start tag of the element (like the ismap attribute for the img element).@@@@1@33@@danf@17-8-2009 10361040@unknown@formal@none@1@S@Most elements can take any of several common attributes:@@@@1@9@@danf@17-8-2009 10361050@unknown@formal@none@1@S@* The id attribute provides a document-wide unique identifier for an element.@@@@1@12@@danf@17-8-2009 10361060@unknown@formal@none@1@S@This can be used by stylesheets to provide presentational properties, by browsers to focus attention on the specific element, or by scripts to alter the contents or presentation of an element.@@@@1@31@@danf@17-8-2009 10361070@unknown@formal@none@1@S@* The class attribute provides a way of classifying similar elements for presentation purposes.@@@@1@14@@danf@17-8-2009 10361080@unknown@formal@none@1@S@For example, an HTML document might use the designation class="notation" to indicate that all elements with this class value are subordinate to the main text of the document.@@@@1@28@@danf@17-8-2009 10361090@unknown@formal@none@1@S@Such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source.@@@@1@26@@danf@17-8-2009 10361100@unknown@formal@none@1@S@* An author may use the style non-attributal codes presentational properties to a particular element.@@@@1@15@@danf@17-8-2009 10361110@unknown@formal@none@1@S@It is considered better practice to use an element’s son- id page and select the element with a stylesheet, though sometimes this can be too cumbersome for a simple ad hoc application of styled properties.@@@@1@35@@danf@17-8-2009 10361120@unknown@formal@none@1@S@* The title attribute is used to attach subtextual explanation to an element.@@@@1@13@@danf@17-8-2009 10361130@unknown@formal@none@1@S@In most browsers this attribute is displayed as what is often referred to as a [[tooltip]].@@@@1@16@@danf@17-8-2009 10361140@unknown@formal@none@1@S@The generic inline element span can be used to demonstrate these various attributes:@@@@1@13@@danf@17-8-2009 10361150@unknown@formal@none@1@S@::HTML@@@@1@8@@danf@17-8-2009 10361160@unknown@formal@none@1@S@This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language."@@@@1@28@@danf@17-8-2009 10361170@unknown@formal@none@1@S@Most elements also take the language-related attributes lang and dir.@@@@1@10@@danf@17-8-2009 10361180@unknown@formal@none@1@S@===Character and entity references===@@@@1@4@@danf@17-8-2009 10361190@unknown@formal@none@1@S@As of version 4.0, HTML defines a set of [[List of XML and HTML character entity references|252]] [[character entity reference]]s and a set of 1,114,050 [[numeric character reference]]s, both of which allow individual characters to be written via simple markup, rather than literally.@@@@1@43@@danf@17-8-2009 10361200@unknown@formal@none@1@S@A literal character and its markup counterpart are considered equivalent and are rendered identically.@@@@1@14@@danf@17-8-2009 10361210@unknown@formal@none@1@S@The ability to "escape" characters in this way allows for the characters < and & (when written as &lt; and &amp;, respectively) to be interpreted as character data, rather than markup.@@@@1@31@@danf@17-8-2009 10361220@unknown@formal@none@1@S@For example, a literal < normally indicates the start of a tag, and & normally indicates the start of a character entity reference or numeric character reference; writing it as &amp; or &#x26; or &#38; allows & to be included in the content of elements or the values of attributes.@@@@1@50@@danf@17-8-2009 10361230@unknown@formal@none@1@S@The double-quote character ("), when used to quote an attribute value, must also be escaped as &quot; or &#x22; or &#34; when it appears within the attribute value itself.@@@@1@29@@danf@17-8-2009 10361240@unknown@formal@none@1@S@The single-quote character ('), when used to quote an attribute value, must also be escaped as &#x27; or &#39; (should NOT be escaped as &apos; except in XHTML documents) when it appears within the attribute value itself.@@@@1@37@@danf@17-8-2009 10361250@unknown@formal@none@1@S@However, since document authors often overlook the need to escape these characters, browsers tend to be very forgiving, treating them as markup only when subsequent text appears to confirm that intent.@@@@1@31@@danf@17-8-2009 10361260@unknown@formal@none@1@S@Escaping also allows for characters that are not easily typed or that aren't even available in the document's [[character encoding]] to be represented within the element and attribute content.@@@@1@29@@danf@17-8-2009 10361270@unknown@formal@none@1@S@For example, the acute-accented e (é), a character typically found only on Western European keyboards, can be written in any HTML document as the entity reference &eacute; or as the numeric references &#233; or &#xE9;.@@@@1@35@@danf@17-8-2009 10361280@unknown@formal@none@1@S@The characters comprising those references (that is, the &, the ;, the letters in eacute, and so on) are available on all keyboards and are supported in all character encodings, whereas the literal é is not.@@@@1@36@@danf@17-8-2009 10361290@unknown@formal@none@1@S@===Data types===@@@@1@2@@danf@17-8-2009 10361300@unknown@formal@none@1@S@HTML defines several [[data type]]s for element content, such as script data and stylesheet data, and a plethora of types for attribute values, including IDs, names, URIs, numbers, units of length, languages, media descriptors, colors, character encodings, dates and times, and so on.@@@@1@43@@danf@17-8-2009 10361310@unknown@formal@none@1@S@All of these data types are specializations of character data.@@@@1@10@@danf@17-8-2009 10361320@unknown@formal@none@1@S@===The Document Type Declaration===@@@@1@4@@danf@17-8-2009 10361330@unknown@formal@none@1@S@In order to enable [[Document Type Definition]] (DTD)-based validation with SGML tools and in order to avoid the [[quirks mode]] in browsers, HTML documents can start with a [[Document Type Declaration]] (informally, a "DOCTYPE").@@@@1@34@@danf@17-8-2009 10361340@unknown@formal@none@1@S@The DTD to which the DOCTYPE refers contains machine-readable grammar specifying the permitted and prohibited content for a document conforming to such a DTD.@@@@1@24@@danf@17-8-2009 10361350@unknown@formal@none@1@S@Browsers do not necessarily read the DTD, however.@@@@1@8@@danf@17-8-2009 10361360@unknown@formal@none@1@S@The most popular graphical browsers use DOCTYPE declarations (or the lack thereof) and other data at the beginning of sources to determine which rendering mode to use.@@@@1@27@@danf@17-8-2009 10361370@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10361380@unknown@formal@none@1@S@:@@@@1@7@@danf@17-8-2009 10361390@unknown@formal@none@1@S@This declaration references the Strict DTD of HTML 4.01, which does not have presentational elements like , leaving formatting to Cascading Style Sheets and the span and div tags.@@@@1@29@@danf@17-8-2009 10361400@unknown@formal@none@1@S@SGML-based validators read the DTD in order to properly parse the document and to perform validation.@@@@1@16@@danf@17-8-2009 10361410@unknown@formal@none@1@S@In modern browsers, the HTML 4.01 Strict doctype activates standards layout mode for [[Cascading Style Sheets|CSS]] as opposed to [[quirks mode]].@@@@1@21@@danf@17-8-2009 10361420@unknown@formal@none@1@S@In addition, HTML 4.01 provides Transitional and Frameset DTDs.@@@@1@9@@danf@17-8-2009 10361430@unknown@formal@none@1@S@The Transitional DTD was intended to gradually phase in the changes made in the Strict DTD, while the Frameset DTD was intended for those documents which contained frames.@@@@1@28@@danf@17-8-2009 10361440@unknown@formal@none@1@S@==Semantic HTML==@@@@1@2@@danf@17-8-2009 10361450@unknown@formal@none@1@S@There is no official specification called "Semantic HTML", though the strict flavors of HTML discussed [[#Current flavors of HTML|below]] are a push in that direction.@@@@1@25@@danf@17-8-2009 10361460@unknown@formal@none@1@S@Rather, semantic HTML refers to an objective and a practice to create documents with HTML that contain only the author's intended meaning, without any reference to how this meaning is presented or conveyed.@@@@1@33@@danf@17-8-2009 10361470@unknown@formal@none@1@S@A classic example is the distinction between the emphasis element (<em>) and the italics element (<i>).@@@@1@16@@danf@17-8-2009 10361480@unknown@formal@none@1@S@Often the emphasis element is displayed in italics, so the presentation is typically the same.@@@@1@15@@danf@17-8-2009 10361490@unknown@formal@none@1@S@However, emphasizing something is different from listing the title of a book, for example, which may also be displayed in italics.@@@@1@21@@danf@17-8-2009 10361500@unknown@formal@none@1@S@In purely semantic HTML, a book title would use a different element than emphasized text uses (for example a <span>), because they are meaningfully different things.@@@@1@26@@danf@17-8-2009 10361510@unknown@formal@none@1@S@The goal of semantic HTML requires two things of authors:@@@@1@10@@danf@17-8-2009 10361520@unknown@formal@none@1@S@# To avoid the use of presentational markup (elements, attributes, and other entities).@@@@1@13@@danf@17-8-2009 10361530@unknown@formal@none@1@S@# To use available markup to differentiate the meanings of phrases and structure in the document.@@@@1@16@@danf@17-8-2009 10361540@unknown@formal@none@1@S@So for example, the book title from above would need to have its own element and class specified, such as <cite class="booktitle">The Grapes of Wrath</cite>.@@@@1@25@@danf@17-8-2009 10361545@unknown@formal@none@1@S@Here, the <cite> element is used because it most closely matches the meaning of this phrase in the text.@@@@1@19@@danf@17-8-2009 10361550@unknown@formal@none@1@S@However, the <cite> element is not specific enough to this task, since we mean to cite specifically a book title as opposed to a newspaper article or an academic journal.@@@@1@30@@danf@17-8-2009 10361560@unknown@formal@none@1@S@Semantic HTML also requires complementary specifications and software compliance with these specifications.@@@@1@12@@danf@17-8-2009 10361570@unknown@formal@none@1@S@Primarily, the development and proliferation of [[Cascading Style Sheets|CSS]] has led to increasing support for semantic HTML, because CSS provides designers with a rich language to alter the presentation of semantic-only documents.@@@@1@32@@danf@17-8-2009 10361580@unknown@formal@none@1@S@With the development of CSS, the need to include presentational properties in a document has virtually disappeared.@@@@1@17@@danf@17-8-2009 10361590@unknown@formal@none@1@S@With the advent and refinement of CSS and the increasing support for it in Web browsers, subsequent editions of HTML increasingly stress only using markup that suggests the semantic structure and phrasing of the document, like headings, paragraphs, quotes, and lists, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics).@@@@1@59@@danf@17-8-2009 10361600@unknown@formal@none@1@S@Some of these elements are not permitted in certain varieties of HTML, like HTML 4.01 Strict.@@@@1@16@@danf@17-8-2009 10361610@unknown@formal@none@1@S@CSS provides a way to separate document semantics from the content's presentation, by keeping everything relevant to presentation defined in a CSS file.@@@@1@23@@danf@17-8-2009 10361620@unknown@formal@none@1@S@See [[separation of style and content]].@@@@1@6@@danf@17-8-2009 10361630@unknown@formal@none@1@S@Semantic HTML offers many advantages.@@@@1@5@@danf@17-8-2009 10361640@unknown@formal@none@1@S@First, it ensures consistency in style across elements that have the same meaning.@@@@1@13@@danf@17-8-2009 10361650@unknown@formal@none@1@S@Every heading, every quotation, every similar element receives the same presentation properties.@@@@1@12@@danf@17-8-2009 10361660@unknown@formal@none@1@S@Second, semantic HTML frees authors from the need to concern themselves with presentation details.@@@@1@14@@danf@17-8-2009 10361670@unknown@formal@none@1@S@When writing the number two, for example, should it be written out in words ("two"), or should it be written as a numeral (2)?@@@@1@24@@danf@17-8-2009 10361680@unknown@formal@none@1@S@A semantic markup might enter something like 2 and leave presentation details to the stylesheet designers.@@@@1@16@@danf@17-8-2009 10361690@unknown@formal@none@1@S@Similarly, an author might wonder where to break out quotations into separate indented blocks of text: with purely semantic HTML, such details would be left up to stylesheet designers.@@@@1@29@@danf@17-8-2009 10361700@unknown@formal@none@1@S@Authors would simply indicate quotations when they occur in the text, and not concern themselves with presentation.@@@@1@17@@danf@17-8-2009 10361710@unknown@formal@none@1@S@A third advantage is device independence and repurposing of documents.@@@@1@10@@danf@17-8-2009 10361720@unknown@formal@none@1@S@A semantic HTML document can be paired with any number of stylesheets to provide output to computer screens (through Web browsers), high-resolution printers, handheld devices, aural browsers or braille devices for those with visual impairments, and so on.@@@@1@38@@danf@17-8-2009 10361730@unknown@formal@none@1@S@To accomplish this, nothing needs to be changed in a well-coded semantic HTML document.@@@@1@14@@danf@17-8-2009 10361740@unknown@formal@none@1@S@Readily available stylesheets make this a simple matter of pairing a semantic HTML document with the appropriate stylesheets.@@@@1@18@@danf@17-8-2009 10361750@unknown@formal@none@1@S@(Of course, the stylesheet's selectors need to match the appropriate properties in the HTML document.)@@@@1@15@@danf@17-8-2009 10361760@unknown@formal@none@1@S@Some aspects of authoring documents make separating semantics from style (in other words, meaning from presentation) difficult.@@@@1@17@@danf@17-8-2009 10361770@unknown@formal@none@1@S@Some elements are hybrids, using presentation in their very meaning.@@@@1@10@@danf@17-8-2009 10361780@unknown@formal@none@1@S@For example, a table displays content in a tabular form.@@@@1@10@@danf@17-8-2009 10361790@unknown@formal@none@1@S@Often such content conveys the meaning only when presented in this way.@@@@1@12@@danf@17-8-2009 10361800@unknown@formal@none@1@S@Repurposing a table for an aural device typically involves somehow presenting the table as an inherently visual element in an audible form.@@@@1@22@@danf@17-8-2009 10361810@unknown@formal@none@1@S@On the other hand, we frequently present lyrical songs—something inherently meant for audible presentation—and instead present them in textual form on a Web page.@@@@1@24@@danf@17-8-2009 10361820@unknown@formal@none@1@S@For these types of elements, the meaning is not so easily separated from their presentation.@@@@1@15@@danf@17-8-2009 10361830@unknown@formal@none@1@S@However, for a great many of the elements used and meanings conveyed in HTML, the translation is relatively smooth.@@@@1@19@@danf@17-8-2009 10361840@unknown@formal@none@1@S@==Delivery of HTML==@@@@1@3@@danf@17-8-2009 10361850@unknown@formal@none@1@S@HTML documents can be delivered by the same means as any other computer file; however, they are most often delivered in one of two forms: over [[HTTP]] servers and through e-mail.@@@@1@31@@danf@17-8-2009 10361860@unknown@formal@none@1@S@===Publishing HTML with HTTP===@@@@1@4@@danf@17-8-2009 10361870@unknown@formal@none@1@S@The [[World Wide Web]] is composed primarily of HTML documents transmitted from a [[Web server]] to a Web browser using the [[Hypertext Transfer Protocol]] (HTTP).@@@@1@25@@danf@17-8-2009 10361880@unknown@formal@none@1@S@However, HTTP can be used to serve images, sound, and other content in addition to HTML.@@@@1@16@@danf@17-8-2009 10361890@unknown@formal@none@1@S@To allow the Web browser to know how to handle the document it received, an indication of the [[file format]] of the document must be transmitted along with the document.@@@@1@30@@danf@17-8-2009 10361900@unknown@formal@none@1@S@This vital [[metadata]] includes the [[MIME]] type (text/html for HTML 4.01 and earlier, application/xhtml+xml for XHTML 1.0 and later) and the character encoding (see [[Character encodings in HTML]]).@@@@1@28@@danf@17-8-2009 10361910@unknown@formal@none@1@S@In modern browsers, the MIME type that is sent with the HTML document affects how the document is interpreted.@@@@1@19@@danf@17-8-2009 10361920@unknown@formal@none@1@S@A document sent with an XHTML MIME type, or ''served as application/xhtml+xml'', is expected to be [[XML#Well-formed documents|well-formed]] XML, and a syntax error causes the browser to fail to render the document.@@@@1@32@@danf@17-8-2009 10361930@unknown@formal@none@1@S@The same document sent with an HTML MIME type, or ''served as text/html'', might be displayed successfully, since Web browsers are more lenient with HTML.@@@@1@25@@danf@17-8-2009 10361940@unknown@formal@none@1@S@However, XHTML parsed in this way is not considered either proper XHTML or HTML, but so-called [[tag soup]].@@@@1@18@@danf@17-8-2009 10361950@unknown@formal@none@1@S@If the MIME type is not recognized as HTML, the Web browser should not attempt to render the document as HTML, even if the document is prefaced with a correct Document Type Declaration.@@@@1@33@@danf@17-8-2009 10361960@unknown@formal@none@1@S@Nevertheless, some Web browsers do examine the contents or URL of the document and attempt to infer the file type, despite this being forbidden by the HTTP 1.1 specification.@@@@1@29@@danf@17-8-2009 10361970@unknown@formal@none@1@S@===HTML e-mail===@@@@1@2@@danf@17-8-2009 10361980@unknown@formal@none@1@S@Most graphical [[e-mail]] clients allow the use of a subset of HTML (often ill-defined) to provide formatting and [[semantic web|semantic]] markup capabilities not available with [[plain text]], like emphasized text, block quotations for replies, and diagrams or mathematical formulas that could not easily be described otherwise.@@@@1@46@@danf@17-8-2009 10361990@unknown@formal@none@1@S@Many of these clients include both a [[GUI]] editor for composing HTML e-mail messages and a rendering engine for displaying received HTML messages.@@@@1@23@@danf@17-8-2009 10362000@unknown@formal@none@1@S@Use of HTML in e-mail is controversial because of compatibility issues, because it can be used in [[phishing]]/privacy attacks, because it can confuse [[E-Mail spam|spam]] filters, and because the message size is larger than plain text.@@@@1@36@@danf@17-8-2009 10362010@unknown@formal@none@1@S@===Naming conventions===@@@@1@2@@danf@17-8-2009 10362020@unknown@formal@none@1@S@The most common [[filename extension]] for [[computer file|files]] containing HTML is .html.@@@@1@12@@danf@17-8-2009 10362030@unknown@formal@none@1@S@A common abbreviation of this is .htm; it originates from older operating systems and file systems, such as the [[DOS]] versions from the 80s and early 90s and [[File Allocation Table|FAT]], which limit file extensions to three letters.@@@@1@38@@danf@17-8-2009 10362040@unknown@formal@none@1@S@Both forms are widely supported by browsers.@@@@1@7@@danf@17-8-2009 10362050@unknown@formal@none@1@S@==Current flavors of HTML==@@@@1@4@@danf@17-8-2009 10362060@unknown@formal@none@1@S@Since its inception, HTML and its associated protocols gained acceptance relatively quickly.@@@@1@12@@danf@17-8-2009 10362070@unknown@formal@none@1@S@However, no clear standards existed in the early years of the language.@@@@1@12@@danf@17-8-2009 10362080@unknown@formal@none@1@S@Though its creators originally conceived of HTML as a semantic language devoid of presentation details, practical uses pushed many presentational elements and attributes into the language, driven largely by the various browser vendors.@@@@1@33@@danf@17-8-2009 10362090@unknown@formal@none@1@S@The latest standards surrounding HTML reflect efforts to overcome the sometimes chaotic development of the language and to create a rational foundation for building both meaningful and well-presented documents.@@@@1@29@@danf@17-8-2009 10362100@unknown@formal@none@1@S@To return HTML to its role as a semantic language, the [[World Wide Web Consortium|W3C]] has developed style languages such as [[Cascading Style Sheets|CSS]] and [[Extensible Stylesheet Language|XSL]] to shoulder the burden of presentation.@@@@1@34@@danf@17-8-2009 10362110@unknown@formal@none@1@S@In conjunction, the HTML specification has slowly reined in the presentational elements.@@@@1@12@@danf@17-8-2009 10362120@unknown@formal@none@1@S@There are two axes differentiating various flavors of HTML as currently specified: SGML-based HTML versus XML-based HTML (referred to as XHTML) on the one axis, and strict versus transitional (loose) versus frameset on the other axis.@@@@1@36@@danf@17-8-2009 10362130@unknown@formal@none@1@S@===SGML-based versus XML-based HTML===@@@@1@4@@danf@17-8-2009 10362140@unknown@formal@none@1@S@One difference in the latest HTML specifications lies in the distinction between the SGML-based specification and the XML-based specification.@@@@1@19@@danf@17-8-2009 10362150@unknown@formal@none@1@S@The XML-based specification is usually called XHTML to distinguish it clearly from the more traditional definition; however, the root element name continues to be 'html' even in the XHTML-specified HTML.@@@@1@30@@danf@17-8-2009 10362160@unknown@formal@none@1@S@The W3C intended XHTML 1.0 to be identical to HTML 4.01 except where limitations of XML over the more complex SGML require workarounds.@@@@1@23@@danf@17-8-2009 10362170@unknown@formal@none@1@S@Because XHTML and HTML are closely related, they are sometimes documented in parallel.@@@@1@13@@danf@17-8-2009 10362180@unknown@formal@none@1@S@In such circumstances, some authors conflate the two names as (X)HTML or X(HTML).@@@@1@13@@danf@17-8-2009 10362190@unknown@formal@none@1@S@Like HTML 4.01, XHTML 1.0 has three sub-specifications: strict, loose, and frameset.@@@@1@12@@danf@17-8-2009 10362200@unknown@formal@none@1@S@Aside from the different opening declarations for a document, the differences between an HTML 4.01 and XHTML 1.0 document—in each of the corresponding DTDs—are largely syntactic.@@@@1@26@@danf@17-8-2009 10362210@unknown@formal@none@1@S@The underlying syntax of HTML allows many shortcuts that XHTML does not, such as elements with optional opening or closing tags, and even EMPTY elements which must not have an end tag.@@@@1@32@@danf@17-8-2009 10362220@unknown@formal@none@1@S@By contrast, XHTML requires all elements to have an opening tag or a closing tag.@@@@1@15@@danf@17-8-2009 10362230@unknown@formal@none@1@S@XHTML, however, also introduces a new shortcut: an XHTML tag may be opened and closed within the same tag, by including a slash before the end of the tag like this: <br/>.@@@@1@32@@danf@17-8-2009 10362240@unknown@formal@none@1@S@The introduction of this shorthand, which is not used in the SGML declaration for HTML 4.01, may confuse earlier software unfamiliar with this new convention.@@@@1@25@@danf@17-8-2009 10362250@unknown@formal@none@1@S@To understand the subtle differences between HTML and XHTML, consider the transformation of a valid and well-formed XHTML 1.0 document that adheres to Appendix C (see below) into a valid HTML 4.01 document.@@@@1@33@@danf@17-8-2009 10362260@unknown@formal@none@1@S@To make this translation requires the following steps:@@@@1@8@@danf@17-8-2009 10362270@unknown@formal@none@1@S@# '''The language for an element should be specified with a lang attribute rather than the XHTML xml:lang attribute.'''@@@@1@19@@danf@17-8-2009 10362280@unknown@formal@none@1@S@XHTML uses XML's built in language-defining functionality attribute.@@@@1@8@@danf@17-8-2009 10362290@unknown@formal@none@1@S@# '''Remove the XML namespace (xmlns=URI).'''@@@@1@6@@danf@17-8-2009 10362300@unknown@formal@none@1@S@HTML has no facilities for namespaces.@@@@1@6@@danf@17-8-2009 10362310@unknown@formal@none@1@S@# '''Change the document type declaration''' from XHTML 1.0 to HTML 4.01. (see [[#The Document Type Definition|DTD section]] for further explanation).@@@@1@21@@danf@17-8-2009 10362320@unknown@formal@none@1@S@# If present, '''remove the XML declaration.'''@@@@1@7@@danf@17-8-2009 10362330@unknown@formal@none@1@S@(Typically this is: ).@@@@1@6@@danf@17-8-2009 10362340@unknown@formal@none@1@S@# '''Ensure that the document’s MIME type is set to text/html.'''@@@@1@11@@danf@17-8-2009 10362350@unknown@formal@none@1@S@For both HTML and XHTML, this comes from the HTTP Content-Type header sent by the server.@@@@1@16@@danf@17-8-2009 10362360@unknown@formal@none@1@S@# '''Change the XML empty-element syntax to an HTML style empty element''' (<br/> to <br>).@@@@1@15@@danf@17-8-2009 10362370@unknown@formal@none@1@S@Those are the main changes necessary to translate a document from XHTML 1.0 to HTML 4.01.@@@@1@16@@danf@17-8-2009 10362380@unknown@formal@none@1@S@To translate from HTML to XHTML would also require the addition of any omitted opening or closing tags.@@@@1@18@@danf@17-8-2009 10362390@unknown@formal@none@1@S@Whether coding in HTML or XHTML it may just be best to always include the optional tags within an HTML document rather than remembering which tags can be omitted.@@@@1@29@@danf@17-8-2009 10362400@unknown@formal@none@1@S@A well-formed XHTML document adheres to all the syntax requirements of XML.@@@@1@12@@danf@17-8-2009 10362410@unknown@formal@none@1@S@A valid document adheres to the content specification for XHTML, which describes the document structure.@@@@1@15@@danf@17-8-2009 10362420@unknown@formal@none@1@S@The W3C recommends several conventions to ensure an easy migration between HTML and XHTML (see [http://www.w3.org/TR/xhtml1/#guidelines HTML Compatibility Guidelines]).@@@@1@19@@danf@17-8-2009 10362430@unknown@formal@none@1@S@The following steps can be applied to XHTML 1.0 documents only:@@@@1@11@@danf@17-8-2009 10362440@unknown@formal@none@1@S@* Include both xml:lang and lang attributes on any elements assigning language.@@@@1@12@@danf@17-8-2009 10362450@unknown@formal@none@1@S@* Use the empty-element syntax only for elements specified as empty in HTML.@@@@1@13@@danf@17-8-2009 10362460@unknown@formal@none@1@S@* Include an extra space in empty-element tags: for example <br /> instead of <br/>.@@@@1@14@@danf@17-8-2009 10362470@unknown@formal@none@1@S@* Include explicit close tags for elements that permit content but are left empty (for example, <div></div>, not <div />).@@@@1@20@@danf@17-8-2009 10362480@unknown@formal@none@1@S@* Omit the XML declaration.@@@@1@5@@danf@17-8-2009 10362490@unknown@formal@none@1@S@By carefully following the W3C’s compatibility guidelines, a user agent should be able to interpret the document equally as HTML or XHTML.@@@@1@22@@danf@17-8-2009 10362500@unknown@formal@none@1@S@For documents that are XHTML 1.0 and have been made compatible in this way, the W3C permits them to be served either as HTML (with a text/html [[MIME type]]), or as XHTML (with an application/xhtml+xml or application/xml MIME type).@@@@1@39@@danf@17-8-2009 10362510@unknown@formal@none@1@S@When delivered as XHTML, browsers should use an XML parser, which adheres strictly to the XML specifications for parsing the document's contents.@@@@1@22@@danf@17-8-2009 10362520@unknown@formal@none@1@S@===Transitional versus Strict ===@@@@1@4@@danf@17-8-2009 10362530@unknown@formal@none@1@S@The latest SGML-based specification HTML 4.01 and the earliest XHTML version include three sub-specifications: Strict, Transitional (once called Loose), and Frameset.@@@@1@21@@danf@17-8-2009 10362540@unknown@formal@none@1@S@The Strict variant represents the standard proper, whereas the Transitional and Frameset variants were developed to assist in the transition from earlier versions of HTML (including HTML 3.2).@@@@1@28@@danf@17-8-2009 10362550@unknown@formal@none@1@S@The Transitional and Frameset variants allow for [[presentational markup]] whereas the Strict variant encourages the use of style sheets through its omission of most presentational markup.@@@@1@26@@danf@17-8-2009 10362560@unknown@formal@none@1@S@The primary differences which make the Transitional variant more permissive than the Strict variant (the differences as the same in HTML 4 and XHTML 1.0) are:@@@@1@26@@danf@17-8-2009 10362570@unknown@formal@none@1@S@* '''A looser content model'''@@@@1@5@@danf@17-8-2009 10362580@unknown@formal@none@1@S@** Inline elements and plain text (#PCDATA) are allowed directly in: body, blockquote, form, noscript and noframes@@@@1@17@@danf@17-8-2009 10362590@unknown@formal@none@1@S@* '''Presentation related elements'''@@@@1@4@@danf@17-8-2009 10362600@unknown@formal@none@1@S@** underline (u)@@@@1@3@@danf@17-8-2009 10362610@unknown@formal@none@1@S@** strike-through (del)@@@@1@3@@danf@17-8-2009 10362620@unknown@formal@none@1@S@** center@@@@1@2@@danf@17-8-2009 10362630@unknown@formal@none@1@S@** font@@@@1@2@@danf@17-8-2009 10362640@unknown@formal@none@1@S@** basefont@@@@1@2@@danf@17-8-2009 10362650@unknown@formal@none@1@S@* '''Presentation related attributes'''@@@@1@4@@danf@17-8-2009 10362660@unknown@formal@none@1@S@** background and bgcolor attributes for body element.@@@@1@8@@danf@17-8-2009 10362670@unknown@formal@none@1@S@** align attribute on div, form, paragraph (p), and heading (h1...h6) elements@@@@1@12@@danf@17-8-2009 10362680@unknown@formal@none@1@S@** align, noshade, size, and width attributes on hr element@@@@1@10@@danf@17-8-2009 10362690@unknown@formal@none@1@S@** align, border, vspace, and hspace attributes on img and object elements@@@@1@12@@danf@17-8-2009 10362700@unknown@formal@none@1@S@** align attribute on legend and caption elements@@@@1@8@@danf@17-8-2009 10362710@unknown@formal@none@1@S@** align and bgcolor on table element@@@@1@7@@danf@17-8-2009 10362720@unknown@formal@none@1@S@** nowrap, bgcolor, width, height on td and th elements@@@@1@10@@danf@17-8-2009 10362730@unknown@formal@none@1@S@** bgcolor attribute on tr element@@@@1@6@@danf@17-8-2009 10362740@unknown@formal@none@1@S@** clear attribute on br element@@@@1@6@@danf@17-8-2009 10362750@unknown@formal@none@1@S@** compact attribute on dl, dir and menu elements@@@@1@9@@danf@17-8-2009 10362760@unknown@formal@none@1@S@** type, compact, and start attributes on ol and ul elements@@@@1@11@@danf@17-8-2009 10362770@unknown@formal@none@1@S@** type and value attributes on li element@@@@1@8@@danf@17-8-2009 10362780@unknown@formal@none@1@S@** width attribute on pre element@@@@1@6@@danf@17-8-2009 10362790@unknown@formal@none@1@S@* '''Additional elements in Transitional specification'''@@@@1@6@@danf@17-8-2009 10362800@unknown@formal@none@1@S@** menu list (no substitute, though unordered list is recommended; may return in XHTML 2.0 specification)@@@@1@16@@danf@17-8-2009 10362810@unknown@formal@none@1@S@** dir list (no substitute, though unordered list is recommended)@@@@1@10@@danf@17-8-2009 10362820@unknown@formal@none@1@S@** isindex (element requires server-side support and is typically added to documents server-side)@@@@1@13@@danf@17-8-2009 10362830@unknown@formal@none@1@S@** applet (deprecated in favor of object element)@@@@1@8@@danf@17-8-2009 10362840@unknown@formal@none@1@S@* '''The language attribute on script element''' (presumably redundant with type attribute, though this is maintained for legacy reasons).@@@@1@19@@danf@17-8-2009 10362850@unknown@formal@none@1@S@* '''Frame related entities'''@@@@1@4@@danf@17-8-2009 10362860@unknown@formal@none@1@S@** frameset element (used in place of body for frameset DTD)@@@@1@11@@danf@17-8-2009 10362870@unknown@formal@none@1@S@** frame element@@@@1@3@@danf@17-8-2009 10362880@unknown@formal@none@1@S@** iframe@@@@1@2@@danf@17-8-2009 10362890@unknown@formal@none@1@S@** noframes@@@@1@2@@danf@17-8-2009 10362900@unknown@formal@none@1@S@** target attribute on anchor, client-side image-map (imagemap), link, form, and base elements@@@@1@13@@danf@17-8-2009 10362910@unknown@formal@none@1@S@===Frameset versus transitional===@@@@1@3@@danf@17-8-2009 10362920@unknown@formal@none@1@S@In addition to the above transitional differences, the frameset specifications (whether XHTML 1.0 or HTML 4.01) specifies a different content model: <body></body> @@@@1@32@@danf@17-8-2009 10362930@unknown@formal@none@1@S@=== Summary of flavors ===@@@@1@5@@danf@17-8-2009 10362940@unknown@formal@none@1@S@As this list demonstrates, the loose flavors of the specification are maintained for legacy support.@@@@1@15@@danf@17-8-2009 10362950@unknown@formal@none@1@S@However, contrary to popular misconceptions, the move to XHTML does not imply a removal of this legacy support.@@@@1@18@@danf@17-8-2009 10362960@unknown@formal@none@1@S@Rather the X in XML stands for extensible and the W3C is modularizing the entire specification and opening it up to independent extensions.@@@@1@23@@danf@17-8-2009 10362970@unknown@formal@none@1@S@The primary achievement in the move from XHTML 1.0 to XHTML 1.1 is the modularization of the entire specification.@@@@1@19@@danf@17-8-2009 10362980@unknown@formal@none@1@S@The strict version of HTML is deployed in XHTML 1.1 through a set of modular extensions to the base XHTML 1.1 specification.@@@@1@22@@danf@17-8-2009 10362990@unknown@formal@none@1@S@Likewise someone looking for the loose (transitional) or frameset specifications will find similar extended XHTML 1.1 support (much of it is contained in the legacy or frame modules).@@@@1@28@@danf@17-8-2009 10363000@unknown@formal@none@1@S@The modularization also allows for separate features to develop on their own timetable.@@@@1@13@@danf@17-8-2009 10363010@unknown@formal@none@1@S@So for example XHTML 1.1 will allow quicker migration to emerging XML standards such as [[MathML]] (a presentational and semantic math language based on XML) and [[XForms]] — a new highly advanced web-form technology to replace the existing HTML forms.@@@@1@40@@danf@17-8-2009 10363020@unknown@formal@none@1@S@In summary, the HTML 4.01 specification primarily reined in all the various HTML implementations into a single clear written specification based on SGML.@@@@1@23@@danf@17-8-2009 10363030@unknown@formal@none@1@S@XHTML 1.0, ported this specification, as is, to the new XML defined specification.@@@@1@13@@danf@17-8-2009 10363040@unknown@formal@none@1@S@Next, XHTML 1.1 takes advantage of the extensible nature of XML and modularizes the whole specification.@@@@1@16@@danf@17-8-2009 10363050@unknown@formal@none@1@S@XHTML 2.0 will be the first step in adding new features to the specification in a standards-body-based approach.@@@@1@18@@danf@17-8-2009 10363060@unknown@formal@none@1@S@== Hypertext features not in HTML ==@@@@1@7@@danf@17-8-2009 10363070@unknown@formal@none@1@S@HTML lacks some of the features found in earlier hypertext systems, such as [[typed link]]s, [[transclusion]], [[source tracking]], [[fat link]]s, and more.@@@@1@22@@danf@17-8-2009 10363080@unknown@formal@none@1@S@Even some hypertext features that were in early versions of HTML have been ignored by most popular web browsers until recently, such as the [[Hyperlink|link]] element and in-browser Web page editing.@@@@1@31@@danf@17-8-2009 10363090@unknown@formal@none@1@S@Sometimes Web services or browser manufacturers remedy these shortcomings.@@@@1@9@@danf@17-8-2009 10363100@unknown@formal@none@1@S@For instance, [[wiki]]s and [[content management system]]s allow surfers to edit the Web pages they visit.@@@@1@16@@danf@17-8-2009 10370010@unknown@formal@none@1@S@
IBM
@@@@1@1@@danf@17-8-2009 10370020@unknown@formal@none@1@S@'''International Business Machines Corporation,''' abbreviated '''IBM''' and nicknamed '''"Big Blue,"''' , is a [[multinational corporation|multinational]] [[computer]] [[technology]] and [[consulting]] [[corporation]] headquartered in [[Armonk, New York]], [[United States of America|USA]].@@@@1@29@@danf@17-8-2009 10370030@unknown@formal@none@1@S@The company is one of the few [[information technology]] companies with a continuous history dating back to the 19th century.@@@@1@20@@danf@17-8-2009 10370040@unknown@formal@none@1@S@IBM manufactures and sells computer [[computer hardware|hardware]] and [[computer software|software]], and offers infrastructure services, [[Internet hosting service|hosting services]], and [[consultant|consulting services]] in areas ranging from [[mainframe computer]]s to [[nanotechnology]].@@@@1@29@@danf@17-8-2009 10370050@unknown@formal@none@1@S@IBM has been known through most of its recent history as the world's largest computer company; with over 388,000 employees worldwide, IBM is the largest [[information technology]] employer in the world.@@@@1@31@@danf@17-8-2009 10370060@unknown@formal@none@1@S@Despite falling behind [[Hewlett-Packard]] in total revenue since 2006, it remains the most profitable.@@@@1@14@@danf@17-8-2009 10370070@unknown@formal@none@1@S@IBM holds more [[patent]]s than any other U.S. based technology company.@@@@1@11@@danf@17-8-2009 10370080@unknown@formal@none@1@S@It has engineers and consultants in over 170 countries and [[IBM Research]] has eight laboratories worldwide.@@@@1@16@@danf@17-8-2009 10370090@unknown@formal@none@1@S@IBM employees have earned three [[Nobel Prize]]s, four [[Turing Award]]s, five [[National Medal of Technology|National Medals of Technology]], and five [[National Medal of Science|National Medals of Science]].@@@@1@27@@danf@17-8-2009 10370100@unknown@formal@none@1@S@As a chip maker, IBM has been among the [[Worldwide Top 20 Semiconductor Sales Leaders]] in past years, and in 2007 IBM ranked second in the list of largest software companies in the world.@@@@1@34@@danf@17-8-2009 10370110@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10370120@unknown@formal@none@1@S@The company which became IBM was founded in 1896 as the Tabulating Machine Company by [[Herman Hollerith]], in [[Broome County, New York]] ([[Endicott, New York]], Where it still maintains very limited operations).@@@@1@32@@danf@17-8-2009 10370130@unknown@formal@none@1@S@It was incorporated as [[Computing Tabulating Recording Corporation (CTR)]] on [[June 16]], [[1911]], and was listed on the [[New York Stock Exchange]] in 1916.@@@@1@24@@danf@17-8-2009 10370140@unknown@formal@none@1@S@IBM adopted its current name in 1924, when it became a [[Fortune 500]] company.@@@@1@14@@danf@17-8-2009 10370150@unknown@formal@none@1@S@In the 1950s, IBM became the dominant vendor in the emerging [[computer]] industry with the release of the [[IBM 701]] and other models in the [[IBM 700/7000 series]] of [[mainframes]].@@@@1@30@@danf@17-8-2009 10370160@unknown@formal@none@1@S@The company's dominance became even more pronounced in the 1960s and 1970s with the [[IBM System/360]] and [[IBM System/370]] mainframes, however antitrust actions by the [[United States Department of Justice]], the rise of [[minicomputer]] companies like [[Digital Equipment Corporation]] and [[Data General]], and the introduction of the [[microprocessor]] all contributed to dilution of IBM's position in the industry, eventually leading the company to diversify into other areas including personal computers, software, and services.@@@@1@73@@danf@17-8-2009 10370170@unknown@formal@none@1@S@In 1981 IBM introduced the [[IBM Personal Computer]] which is the original version and progenitor of the [[IBM PC compatible]] hardware [[platform (computing)|platform]].@@@@1@23@@danf@17-8-2009 10370180@unknown@formal@none@1@S@Descendants of the IBM PC compatibles make up the majority of [[microcomputer]]s on the market today.@@@@1@16@@danf@17-8-2009 10370190@unknown@formal@none@1@S@IBM sold its PC division to the Chinese company [[Lenovo]] on [[May 1]], [[2005]] for $655 million in cash and $600 million in Lenovo stock.@@@@1@25@@danf@17-8-2009 10370200@unknown@formal@none@1@S@On [[January 25]], [[2007]], [[Ricoh]] announced purchase of IBM Printing Systems Division for $725 million and investment in 3-year joint venture to form a new Ricoh subsidiary, [[InfoPrint Solutions Company]]; Ricoh will own a 51% share, and IBM will own a 49% share in ''InfoPrint''.@@@@1@45@@danf@17-8-2009 10370210@unknown@formal@none@1@S@===Controversies===@@@@1@1@@danf@17-8-2009 10370220@unknown@formal@none@1@S@The author [[Edwin Black]] has alleged that, during [[World War II]], IBM CEO [[Thomas J. Watson]] used overseas subsidiaries to provide the [[Third Reich]] with [[Unit record equipment|unit record]] [[data processing]] machines, supplies and services that helped the [[Nazis]] to efficiently track down European Jews, with sizable profits for the company.@@@@1@51@@danf@17-8-2009 10370230@unknown@formal@none@1@S@IBM denies that they had control over these subsidiaries after the Nazis took power.@@@@1@14@@danf@17-8-2009 10370240@unknown@formal@none@1@S@A lawsuit against IBM based on these allegations was dismissed.@@@@1@10@@danf@17-8-2009 10370250@unknown@formal@none@1@S@In support of the Allied war effort in World War II, from 1943 to 1945 IBM produced approximately 346,500 M1 Carbine (Caliber .30 carbine) light rifles for the U.S. Military.@@@@1@30@@danf@17-8-2009 10370260@unknown@formal@none@1@S@==Current projects==@@@@1@2@@danf@17-8-2009 10370270@unknown@formal@none@1@S@===Eclipse===@@@@1@1@@danf@17-8-2009 10370280@unknown@formal@none@1@S@Eclipse is a platform-independent, [[Java (programming language)|Java]]-based [[software framework]].@@@@1@9@@danf@17-8-2009 10370290@unknown@formal@none@1@S@Eclipse was originally a [[Proprietary software|proprietary]] product developed by IBM as a successor of the [[VisualAge]] family of tools.@@@@1@19@@danf@17-8-2009 10370300@unknown@formal@none@1@S@Eclipse has subsequently been released as [[free software|free]]/[[open source]] software under the [[Eclipse Public License]].@@@@1@15@@danf@17-8-2009 10370310@unknown@formal@none@1@S@===developerWorks===@@@@1@1@@danf@17-8-2009 10370320@unknown@formal@none@1@S@developerWorks is a website run by [[IBM]] for [[software developer]]s and IT professionals.@@@@1@13@@danf@17-8-2009 10370330@unknown@formal@none@1@S@It contains a large number of how-to articles and tutorials, as well as software downloads and code samples, discussion forums, podcasts, blogs, wikis, and other resources for developers and technical professionals.@@@@1@31@@danf@17-8-2009 10370340@unknown@formal@none@1@S@Subjects range from open, industry-standard technologies like [[Java (programming language)|Java]], [[Linux]], [[Service-oriented architecture|SOA]] and [[web services]], [[web development]], [[Ajax (programming)|Ajax]], [[PHP]], and [[XML]] to IBM's products ([[WebSphere]], [[Rational Software|Rational]], [[Lotus Software|Lotus]], [[Tivoli Systems, Inc.|Tivoli]] and [[IBM DB2|DB2]]).@@@@1@37@@danf@17-8-2009 10370350@unknown@formal@none@1@S@In 2007 developerWorks was inducted into the Jolt Hall of Fame.@@@@1@11@@danf@17-8-2009 10370360@unknown@formal@none@1@S@===alphaWorks===@@@@1@1@@danf@17-8-2009 10370370@unknown@formal@none@1@S@alphaWorks is IBM's source for emerging software technologies.@@@@1@8@@danf@17-8-2009 10370380@unknown@formal@none@1@S@These technologies include:@@@@1@3@@danf@17-8-2009 10370390@unknown@formal@none@1@S@*'''Flexible Internet Evaluation Report Architecture''' - A highly flexible architecture for the design, display, and reporting of Internet surveys.@@@@1@19@@danf@17-8-2009 10370400@unknown@formal@none@1@S@*'''[[IBM History Flow tool|IBM History Flow Visualization Application]]''' - A tool for visualizing dynamic, evolving documents and the interactions of multiple collaborating authors.@@@@1@23@@danf@17-8-2009 10370410@unknown@formal@none@1@S@*'''IBM [[Linux]] on POWER Performance Simulator''' - A tool that provides users of Linux on Power a set of performance models for IBM's POWER processors.@@@@1@25@@danf@17-8-2009 10370420@unknown@formal@none@1@S@*'''Database File Archive And Restoration Management''' - An application for archiving and restoring hard disk files using file references stored in a database.@@@@1@23@@danf@17-8-2009 10370430@unknown@formal@none@1@S@*'''Policy Management for Autonomic Computing''' - A policy-based autonomic management infrastructure that simplifies the automation of IT and business processes.@@@@1@20@@danf@17-8-2009 10370440@unknown@formal@none@1@S@*'''FairUCE''' - A spam filter that verifies sender identity instead of filtering content.@@@@1@13@@danf@17-8-2009 10370450@unknown@formal@none@1@S@*'''Unstructured Information Management Architecture (UIMA) SDK''' - A Java SDK that supports the implementation, composition, and deployment of applications working with unstructured information.@@@@1@23@@danf@17-8-2009 10370460@unknown@formal@none@1@S@*'''Accessibility Browser''' - A web-browser specifically designed to assist people with visual impairments, to be released as open-source software.@@@@1@19@@danf@17-8-2009 10370470@unknown@formal@none@1@S@Also known as the "A-Browser," the technology will aim to eliminate the need for a mouse, relying instead completely on voice-controls, buttons and predefined shortcut keys.@@@@1@26@@danf@17-8-2009 10370480@unknown@formal@none@1@S@===Semiconductor design and manufacturing===@@@@1@4@@danf@17-8-2009 10370490@unknown@formal@none@1@S@Virtually all modern [[video game console|console gaming systems]] use [[IC design|microprocessors developed]] by IBM.@@@@1@14@@danf@17-8-2009 10370500@unknown@formal@none@1@S@The [[Xbox 360]] contains the [[Xenon (processor)|Xenon]] tri-core processor, which was designed and produced by IBM in less than 24 months.@@@@1@21@@danf@17-8-2009 10370510@unknown@formal@none@1@S@Sony's [[PlayStation 3]] features the [[Cell microprocessor| Cell BE microprocessor]] designed jointly by IBM, [[Toshiba]], and [[Sony]].@@@@1@17@@danf@17-8-2009 10370520@unknown@formal@none@1@S@[[Nintendo]]'s [[History of video game consoles (seventh generation)|seventh-generation]] console, [[Wii]], features an IBM chip codenamed [[Broadway (microprocessor)|Broadway]].@@@@1@17@@danf@17-8-2009 10370530@unknown@formal@none@1@S@The older [[Nintendo GameCube]] also utilizes the [[Gekko (microprocessor)|Gekko]] processor, designed by IBM.@@@@1@13@@danf@17-8-2009 10370540@unknown@formal@none@1@S@In May 2002, IBM and Butterfly.net, Inc. announced the Butterfly Grid, a commercial [[grid computing|grid]] for the online video gaming market.@@@@1@21@@danf@17-8-2009 10370550@unknown@formal@none@1@S@In March 2006, IBM announced separate agreements with Hoplon Infotainment, Online Game Services Incorporated (OGSI), and RenderRocket to provide on-demand content management and [[blade server]] computing resources.@@@@1@27@@danf@17-8-2009 10370560@unknown@formal@none@1@S@===Open Client Offering===@@@@1@3@@danf@17-8-2009 10370570@unknown@formal@none@1@S@IBM announced it will launch its new software, called "Open Client Offering" which is to run on [[Microsoft]]'s [[Microsoft Windows|Windows]], [[Linux]] and [[Apple Inc.|Apple]]'s [[Macintosh]].@@@@1@25@@danf@17-8-2009 10370580@unknown@formal@none@1@S@The company states that its new product allows businesses to offer employees a choice of using the same software on Windows and its alternatives.@@@@1@24@@danf@17-8-2009 10370590@unknown@formal@none@1@S@This means that "Open Client Offering" is to cut costs of managing whether Linux or Apple relative to Windows.@@@@1@19@@danf@17-8-2009 10370600@unknown@formal@none@1@S@There will be no necessity for companies to pay Microsoft for its licenses for operations since the operations will no longer rely on software which is Windows-based.@@@@1@27@@danf@17-8-2009 10370610@unknown@formal@none@1@S@One of Microsoft's office alternatives is the Open Document Format software, whose development IBM supports.@@@@1@15@@danf@17-8-2009 10370620@unknown@formal@none@1@S@It is going to be used for several tasks like: word processing, presentations, along with collaboration with [[Lotus Notes]], instant messaging and blog tools as well as an [[Internet Explorer]] competitor – the [[Firefox]] web browser.@@@@1@36@@danf@17-8-2009 10370630@unknown@formal@none@1@S@IBM plans to install Open Client on 5 percent of its desktop PCs.@@@@1@13@@danf@17-8-2009 10370640@unknown@formal@none@1@S@===UC2: Unified Communications and Collaboration===@@@@1@5@@danf@17-8-2009 10370650@unknown@formal@none@1@S@'''UC2''' (''Unified Communications and Collaboration'') is an IBM and [[Cisco]] joint project based on [[Eclipse (software)|Eclipse]] and [[OSGi]].@@@@1@18@@danf@17-8-2009 10370660@unknown@formal@none@1@S@It will offer the numerous Eclipse application developers a unified platform for an easier work environment.@@@@1@16@@danf@17-8-2009 10370670@unknown@formal@none@1@S@The software based on UC2 platform will provide major enterprises with easy-to-use communication solutions, such as the Lotus based [[Sametime]].@@@@1@20@@danf@17-8-2009 10370680@unknown@formal@none@1@S@In the future the Sametime users will benefit from such additional functions as [[click-to-call]] and [[Voicemail|voice mailing]].@@@@1@17@@danf@17-8-2009 10370690@unknown@formal@none@1@S@===Internal programs===@@@@1@2@@danf@17-8-2009 10370700@unknown@formal@none@1@S@[[Extreme Blue]] is a company initiative that uses experienced IBM engineers, talented interns, and business managers to develop high-value technology.@@@@1@20@@danf@17-8-2009 10370710@unknown@formal@none@1@S@The project is designed to analyze emerging business needs and the technologies that can solve them.@@@@1@16@@danf@17-8-2009 10370720@unknown@formal@none@1@S@These projects mostly involve rapid-prototyping of high-profile software and hardware projects.@@@@1@11@@danf@17-8-2009 10370730@unknown@formal@none@1@S@In May 2007, IBM unveiled [[Project Big Green]] -- a re-direction of $1 billion per year across its businesses to increase energy efficiency.@@@@1@23@@danf@17-8-2009 10370740@unknown@formal@none@1@S@==IBM Software Group==@@@@1@3@@danf@17-8-2009 10370750@unknown@formal@none@1@S@This group is one of the major divisions of IBM.@@@@1@10@@danf@17-8-2009 10370760@unknown@formal@none@1@S@The various brands include:@@@@1@4@@danf@17-8-2009 10370770@unknown@formal@none@1@S@* [[IBM Information Management Software|Information Management Software]] — database servers and tools, text analytics, content management, business process management and business intelligence.@@@@1@22@@danf@17-8-2009 10370780@unknown@formal@none@1@S@* [[Lotus Software]] — Groupware, collaboration and business software.@@@@1@9@@danf@17-8-2009 10370790@unknown@formal@none@1@S@Acquired in 1995.@@@@1@3@@danf@17-8-2009 10370800@unknown@formal@none@1@S@* [[Rational Software]] — Software development and application lifecycle management.@@@@1@10@@danf@17-8-2009 10370810@unknown@formal@none@1@S@Acquired in 2002.@@@@1@3@@danf@17-8-2009 10370820@unknown@formal@none@1@S@* [[Tivoli Software]] — Systems management.@@@@1@6@@danf@17-8-2009 10370830@unknown@formal@none@1@S@Acquired in 1996.@@@@1@3@@danf@17-8-2009 10370840@unknown@formal@none@1@S@* [[IBM WebSphere|WebSphere]] — Integration and application infrastructure software.@@@@1@9@@danf@17-8-2009 10370850@unknown@formal@none@1@S@==Environmental record==@@@@1@2@@danf@17-8-2009 10370860@unknown@formal@none@1@S@IBM has a long history of dealing with its environmental problems.@@@@1@11@@danf@17-8-2009 10370870@unknown@formal@none@1@S@It established a corporate policy on environmental protection in 1971, with the support of a comprehensive global environmental management system.@@@@1@20@@danf@17-8-2009 10370880@unknown@formal@none@1@S@According to IBM’s stats, its total hazardous waste decreased by 44 percent over the past five years, and has decreased by 94.6 percent since 1987.@@@@1@25@@danf@17-8-2009 10370890@unknown@formal@none@1@S@IBM's total hazardous waste calculation consists of waste from both non-manufacturing and manufacturing operations.@@@@1@14@@danf@17-8-2009 10370900@unknown@formal@none@1@S@Waste from manufacturing operations includes waste recycled in closed-loop systems where process chemicals are recovered and for subsequent reuse, rather than just disposing and using new chemical materials.@@@@1@28@@danf@17-8-2009 10370910@unknown@formal@none@1@S@Over the years, IBM has redesigned processes to eliminate almost all closed loop recycling and now uses more environmental-friendly materials in their place.@@@@1@23@@danf@17-8-2009 10370920@unknown@formal@none@1@S@IBM was recognized as one of the "Top 20 Best Workplaces for Commuters" by the U.S. Environmental Protection Agency ([[EPA]]) in 2005.@@@@1@22@@danf@17-8-2009 10370930@unknown@formal@none@1@S@This was to recognize the Fortune 500 companies that provided their employees with excellent commuter benefits that helped reduce traffic and air pollution.@@@@1@23@@danf@17-8-2009 10370940@unknown@formal@none@1@S@However, the birthplace of IBM, [[Endicott, New York|Endicott]], suffered IBM's pollution for decades.@@@@1@13@@danf@17-8-2009 10370950@unknown@formal@none@1@S@IBM used liquid cleaning agents in its circuit board assembly operation for more than two decades, and six spills and leaks incidents were recorded, including one 1979 leak of 4,100 gallons from an underground tank.@@@@1@35@@danf@17-8-2009 10370960@unknown@formal@none@1@S@These left behind volatile organic compounds in the town's soil and aquifer.@@@@1@12@@danf@17-8-2009 10370970@unknown@formal@none@1@S@Trace elements of volatile organic compounds have been identified in the Endicott’s drinking water, but the levels are within regulatory limits.@@@@1@21@@danf@17-8-2009 10370980@unknown@formal@none@1@S@Also, from 1980, IBM has pumped out 78,000 gallons of chemicals, including trichloroethane, Freon, benzene and perchloroethene to the air and allegedly caused several cancer cases among the villagers.@@@@1@29@@danf@17-8-2009 10370990@unknown@formal@none@1@S@IBM Endicott has been identified by the Department of Environmental Conservation as the major source of pollution, though traces of contaminants from a local dry cleaner and other polluters were also found.@@@@1@32@@danf@17-8-2009 10371000@unknown@formal@none@1@S@Despite the amount of pollutant, state health officials cannot say whether air or water pollution in Endicott has actually caused any health problems.@@@@1@23@@danf@17-8-2009 10371010@unknown@formal@none@1@S@Village officials say tests show that the water is safe to drink.@@@@1@12@@danf@17-8-2009 10371020@unknown@formal@none@1@S@=== Solar power ===@@@@1@4@@danf@17-8-2009 10371030@unknown@formal@none@1@S@Tokyo Ohka Kogyo Co., Ltd. (TOK) and IBM are collaborating to establish new, low-cost methods for bringing the next generation of solar energy products to market,this is, [[CIGS]] (Copper-Indium-Gallium-Selenide) [[solar cell]] modules.@@@@1@32@@danf@17-8-2009 10371040@unknown@formal@none@1@S@Use of [[thin film]] technology, such as CIGS, has great promise in reducing the overall cost of solar cells and further enabling their widespread adoption.@@@@1@25@@danf@17-8-2009 10371050@unknown@formal@none@1@S@IBM is exploring four main areas of photovoltaic research: using current technologies to develop cheaper and more efficient [[silicon]] [[solar cell]]s, developing new solution processed [[thin film]] photovoltaic devices, [[concentrator photovoltaics]], and future generation photovoltaic architectures based upon [[nanostructures]] such as [[semiconductor quantum dot]]s and [[nanowire]]s.@@@@1@46@@danf@17-8-2009 10371060@unknown@formal@none@1@S@Dr. Supratik Guha is the leading scientist in IBM photovoltaics.@@@@1@10@@danf@17-8-2009 10371070@unknown@formal@none@1@S@==Corporate culture of IBM==@@@@1@4@@danf@17-8-2009 10371080@unknown@formal@none@1@S@'''Big Blue''' is a nickname for IBM; several theories exist regarding its origin.@@@@1@13@@danf@17-8-2009 10371090@unknown@formal@none@1@S@One theory, substantiated by people who worked for IBM at the time, is that IBM field reps coined the term in the 1960s, referring to the color of the mainframes IBM installed in the 1960s and early 1970s.@@@@1@38@@danf@17-8-2009 10371100@unknown@formal@none@1@S@"All blue" was a term used to describe a loyal IBM customer, and business writers later picked up the term.@@@@1@20@@danf@17-8-2009 10371110@unknown@formal@none@1@S@Another theory suggests that Big Blue simply refers to the Company's [[logo]].@@@@1@12@@danf@17-8-2009 10371120@unknown@formal@none@1@S@A third theory suggests that Big Blue refers to a former company dress code that required many IBM employees to wear only white shirts and many wore blue suits.@@@@1@29@@danf@17-8-2009 10371130@unknown@formal@none@1@S@In any event, IBM keyboards, typewriters, and some other manufactured devices, have played on the "Big Blue" concept, using the color for enter keys and carriage returns.@@@@1@27@@danf@17-8-2009 10371140@unknown@formal@none@1@S@===Sales===@@@@1@1@@danf@17-8-2009 10371150@unknown@formal@none@1@S@IBM has often been described as having a sales-centric or a sales-oriented business culture.@@@@1@14@@danf@17-8-2009 10371160@unknown@formal@none@1@S@Traditionally, many IBM executives and general managers are chosen from the sales force.@@@@1@13@@danf@17-8-2009 10371170@unknown@formal@none@1@S@The current CEO, [[Sam Palmisano]], for example, joined the company as a salesman and, unusually for CEOs of major corporations, has no MBA or postgraduate qualification.@@@@1@26@@danf@17-8-2009 10371180@unknown@formal@none@1@S@Middle and top management are often enlisted to give direct support to salesmen when pitching sales to important customers.@@@@1@19@@danf@17-8-2009 10371190@unknown@formal@none@1@S@===The uniform===@@@@1@2@@danf@17-8-2009 10371200@unknown@formal@none@1@S@A dark (or gray) suit, white shirt, and a "sincere" tie was the public uniform for IBM employees for most of the 20th Century.@@@@1@24@@danf@17-8-2009 10371210@unknown@formal@none@1@S@During IBM's management transformation in the 1990s, CEO [[Lou Gerstner]] relaxed these codes, normalizing the dress and behavior of IBM employees to resemble their counterparts in other large technology companies.@@@@1@30@@danf@17-8-2009 10371220@unknown@formal@none@1@S@===IBM company values and "Jam"===@@@@1@5@@danf@17-8-2009 10371230@unknown@formal@none@1@S@In 2003, IBM embarked on an ambitious project to rewrite company values.@@@@1@12@@danf@17-8-2009 10371240@unknown@formal@none@1@S@Using its ''Jam'' technology, the company hosted Intranet-based online discussions on key business issues with 50,000 employees over 3 days.@@@@1@20@@danf@17-8-2009 10371250@unknown@formal@none@1@S@The discussions were analyzed by sophisticated text analysis software (eClassifier) to mine online comments for themes.@@@@1@16@@danf@17-8-2009 10371260@unknown@formal@none@1@S@As a result of the 2003 Jam, the company values were updated to reflect three modern business, marketplace and employee views: "Dedication to every client's success", "Innovation that matters - for our company and for the world", "Trust and personal responsibility in all relationships".@@@@1@44@@danf@17-8-2009 10371270@unknown@formal@none@1@S@In 2004, another Jam was conducted during which 52,000 employees exchanged best practices for 72 hours.@@@@1@16@@danf@17-8-2009 10371280@unknown@formal@none@1@S@They focused on finding actionable ideas to support implementation of the values previously identified.@@@@1@14@@danf@17-8-2009 10371290@unknown@formal@none@1@S@A new post-Jam Ratings event was developed to allow IBMers to select key ideas that support the values.@@@@1@18@@danf@17-8-2009 10371300@unknown@formal@none@1@S@The board of directors cited this Jam when awarding Palmisano a pay rise in the spring of 2005.@@@@1@18@@danf@17-8-2009 10371310@unknown@formal@none@1@S@In July and September 2006, Palmisano launched another jam called [https://www.globalinnovationjam.com/ InnovationJam].@@@@1@12@@danf@17-8-2009 10371320@unknown@formal@none@1@S@InnovationJam was the largest online brainstorming session ever with more than 150,000 participants from 104 countries.@@@@1@16@@danf@17-8-2009 10371330@unknown@formal@none@1@S@The participants were IBM employees, members of IBM employees' families, universities, partners, and customers.@@@@1@14@@danf@17-8-2009 10371340@unknown@formal@none@1@S@InnovationJam was divided in two sessions (one in July and one in September) for 72 hours each and generated more than 46,000 ideas.@@@@1@23@@danf@17-8-2009 10371350@unknown@formal@none@1@S@In November 2006, IBM declared that they will invest $US 100 million in the 10 best ideas from InnovationJam.@@@@1@19@@danf@17-8-2009 10371360@unknown@formal@none@1@S@===Open source===@@@@1@2@@danf@17-8-2009 10371370@unknown@formal@none@1@S@IBM has been influenced by the [[Open Source Initiative]], and began supporting [[Linux]] in 1998.@@@@1@15@@danf@17-8-2009 10371380@unknown@formal@none@1@S@The company invests billions of dollars in services and software based on [[Linux]] through the IBM [[Linux Technology Center]], which includes over 300 [[Linux kernel]] developers.@@@@1@26@@danf@17-8-2009 10371390@unknown@formal@none@1@S@IBM has also released code under different [[open-source license]]s, such as the platform-independent software framework [[Eclipse (software)|Eclipse]] (worth approximately US$40 million at the time of the donation) and the [[Java (programming language)|Java]]-based [[relational database management system]] (RDBMS) [[Apache Derby]].@@@@1@39@@danf@17-8-2009 10371400@unknown@formal@none@1@S@IBM's open source involvement has not been trouble-free, however (see ''[[SCO v. IBM]]'').@@@@1@13@@danf@17-8-2009 10371410@unknown@formal@none@1@S@== Corporate affairs ==@@@@1@4@@danf@17-8-2009 10371420@unknown@formal@none@1@S@=== Diversity and workforce issues ===@@@@1@6@@danf@17-8-2009 10371430@unknown@formal@none@1@S@IBM's efforts to promote workforce diversity and equal opportunity date back at least to [[World War I]], when the company hired disabled veterans.@@@@1@23@@danf@17-8-2009 10371440@unknown@formal@none@1@S@IBM was the only technology company ranked in ''Working Mother'' magazine's Top 10 for 2004, and one of two technology companies in 2005 (the other company being Hewlett-Packard).@@@@1@28@@danf@17-8-2009 10371450@unknown@formal@none@1@S@On [[September 21]], [[1953]], [[Thomas J. Watson]], the CEO at the time, sent out a very controversial letter to all IBM employees stating that IBM needed to hire the best people, regardless of their race, ethnic origin, or gender.@@@@1@39@@danf@17-8-2009 10371460@unknown@formal@none@1@S@In 1984, IBM added sexual preference.@@@@1@6@@danf@17-8-2009 10371470@unknown@formal@none@1@S@He stated that this would give IBM a competitive advantage because IBM would then be able to hire talented people its competitors would turn down.@@@@1@25@@danf@17-8-2009 10371480@unknown@formal@none@1@S@The company has traditionally resisted [[trade union|labor union]] organizing, although unions represent some IBM workers outside the United States.@@@@1@19@@danf@17-8-2009 10371490@unknown@formal@none@1@S@In the 1990s, two major [[pension]] program changes, including a conversion to a cash balance plan, resulted in an employee [[class action]] lawsuit alleging [[age discrimination]].@@@@1@26@@danf@17-8-2009 10371500@unknown@formal@none@1@S@IBM employees won the lawsuit and arrived at a partial settlement, although appeals are still underway.@@@@1@16@@danf@17-8-2009 10371510@unknown@formal@none@1@S@IBM also settled a major overtime class-action lawsuit in 2006.@@@@1@10@@danf@17-8-2009 10371520@unknown@formal@none@1@S@Historically IBM has had a good reputation of long-term staff retention with few large scale layoffs.@@@@1@16@@danf@17-8-2009 10371530@unknown@formal@none@1@S@In more recent years there have been a number of broad sweeping cuts to the workforce as IBM attempts to adapt to changing market conditions and a declining profit base.@@@@1@30@@danf@17-8-2009 10371540@unknown@formal@none@1@S@After posting weaker than expected revenues in the first quarter of 2005, IBM eliminated 14,500 positions from its workforce, predominantly in Europe.@@@@1@22@@danf@17-8-2009 10371550@unknown@formal@none@1@S@In May 2005, IBM Ireland said to staff that the MD(Micro-electronics Division) facility was closing down by the end of 2005 and offered a settlement to staff.@@@@1@27@@danf@17-8-2009 10371560@unknown@formal@none@1@S@However, all staff that wished to stay with the Company were redeployed within IBM Ireland.@@@@1@15@@danf@17-8-2009 10371570@unknown@formal@none@1@S@The production moved to a company called Amkor in Singapore who purchased IBM's Microelectronics business in Singapore and is widely agreed that IBM promised this Company a full load capacity in return for the purchase of the facility.@@@@1@38@@danf@17-8-2009 10371580@unknown@formal@none@1@S@On [[June 8]] [[2005]], IBM Canada Ltd. eliminated approximately 700 positions.@@@@1@11@@danf@17-8-2009 10371590@unknown@formal@none@1@S@IBM projects these as part of a strategy to "rebalance" its portfolio of professional skills & businesses.@@@@1@17@@danf@17-8-2009 10371600@unknown@formal@none@1@S@[[IBM India]] and other IBM offices in [[China]], the [[Philippines]] and [[Costa Rica]] have been witnessing a recruitment boom and steady growth in number of employees due to lower wages.@@@@1@30@@danf@17-8-2009 10371610@unknown@formal@none@1@S@On [[October 10]] [[2005]], IBM became the first major company in the world to formally commit to not using [[genetic testing|genetic information]] in its employment decisions.@@@@1@26@@danf@17-8-2009 10371620@unknown@formal@none@1@S@This came just a few months after IBM announced its support of the [[National Geographic Society]]'s [[The Genographic Project|Genographic Project]].@@@@1@20@@danf@17-8-2009 10371630@unknown@formal@none@1@S@==== Gay rights ====@@@@1@4@@danf@17-8-2009 10371640@unknown@formal@none@1@S@IBM provides employees' same-sex partners with benefits and provides an anti-discrimination clause.@@@@1@12@@danf@17-8-2009 10371650@unknown@formal@none@1@S@The [[Human Rights Campaign]] has consistently rated IBM 100% on its index of gay-friendliness since 2003 (in 2002, the year it began compiling its report on major companies, IBM scored 86%).@@@@1@31@@danf@17-8-2009 10371660@unknown@formal@none@1@S@===Logos===@@@@1@1@@danf@17-8-2009 10371670@unknown@formal@none@1@S@[[Logo]]s designed in the 1970s tended to be sensitive to the technical limitations of photocopiers, which were then being widely deployed.@@@@1@21@@danf@17-8-2009 10371680@unknown@formal@none@1@S@A logo with large solid areas tended to be poorly copied by copiers in the 1970s, so companies preferred logos that avoided large solid areas.@@@@1@25@@danf@17-8-2009 10371690@unknown@formal@none@1@S@The 1972 IBM logos are an example of this tendency.@@@@1@10@@danf@17-8-2009 10371700@unknown@formal@none@1@S@With the advent of digital copiers in the mid-1980s this technical restriction had largely disappeared; at roughly the same time, the 13-bar logo was abandoned for almost the opposite reason it was difficult to render accurately on the low-resolution digital printers (240 dots per inch) of the time.@@@@1@48@@danf@17-8-2009 10371710@unknown@formal@none@1@S@===Board of directors===@@@@1@3@@danf@17-8-2009 10371720@unknown@formal@none@1@S@Current members of the [[board of directors]] of IBM are:@@@@1@10@@danf@17-8-2009 10371730@unknown@formal@none@1@S@*Cathleen Black President, [[Hearst Corporation|Hearst Magazines]]@@@@1@6@@danf@17-8-2009 10371740@unknown@formal@none@1@S@*[[William Brody]] President, [[Johns Hopkins University]]@@@@1@6@@danf@17-8-2009 10371750@unknown@formal@none@1@S@*[[Ken Chenault]] Chairman and CEO, [[American Express]] Company@@@@1@8@@danf@17-8-2009 10371760@unknown@formal@none@1@S@*Juergen Dormann Chairman of the Board, ABB Ltd@@@@1@8@@danf@17-8-2009 10371770@unknown@formal@none@1@S@*[[Michael Eskew]] Chairman and CEO, [[United Parcel Service]], Inc.@@@@1@9@@danf@17-8-2009 10371780@unknown@formal@none@1@S@*[[Shirley Ann Jackson]] President, [[Rensselaer Polytechnic Institute]]@@@@1@7@@danf@17-8-2009 10371790@unknown@formal@none@1@S@*Minoru Makihara Senior Corporate Advisor and former Chairman, [[Mitsubishi Corporation]]@@@@1@10@@danf@17-8-2009 10371800@unknown@formal@none@1@S@*Lucio Noto Managing Partner, Midstream Partners LLC@@@@1@7@@danf@17-8-2009 10371810@unknown@formal@none@1@S@*[[James W. Owens]] Chairman and CEO, [[Caterpillar Inc.]]@@@@1@8@@danf@17-8-2009 10371820@unknown@formal@none@1@S@*[[Samuel J. Palmisano]] Chairman, President and CEO, IBM@@@@1@8@@danf@17-8-2009 10371830@unknown@formal@none@1@S@*Joan Spero President, [[Doris Duke]] Charitable Foundation@@@@1@7@@danf@17-8-2009 10371840@unknown@formal@none@1@S@*Sidney Taurell, Chairman and CEO, [[Eli Lilly and Company]]@@@@1@9@@danf@17-8-2009 10371850@unknown@formal@none@1@S@*[[Lorenzo Zambrano]] Chairman and CEO, [[Cemex]] SAB de CV@@@@1@9@@danf@17-8-2009 10380010@unknown@formal@none@1@S@
Information
@@@@1@1@@danf@17-8-2009 10380020@unknown@formal@none@1@S@'''Information''' as a [[Conveyed concept|concept]] has a diversity of meanings, from everyday usage to technical settings.@@@@1@16@@danf@17-8-2009 10380030@unknown@formal@none@1@S@Generally speaking, the concept of information is closely related to notions of [[constraint]], [[communication]], [[control system|control]], [[data]], [[form]], [[instruction]], [[knowledge]], [[Meaning (linguistics)|meaning]], [[stimulation|mental stimulus]], [[pattern]], [[perception]], and [[knowledge representation|representation]].@@@@1@29@@danf@17-8-2009 10380040@unknown@formal@none@1@S@Many people speak about the [[Information Age]] as the advent of the Knowledge Age or [[knowledge society]], the [[information society]], the [[Information revolution]], and [[Information technology|information technologies]], and even though [[informatics]], [[information science]] and [[computer science]] are often in the spotlight, the word "information" is often used without careful consideration of the various meanings it has acquired.@@@@1@57@@danf@17-8-2009 10380050@unknown@formal@none@1@S@== Etymology ==@@@@1@3@@danf@17-8-2009 10380060@unknown@formal@none@1@S@According to the [[Oxford English Dictionary]], the earliest historical meaning of the word ''information'' in [[English language|English]] was the act of ''informing'', or giving form or shape to the mind, as in education, instruction, or training.@@@@1@36@@danf@17-8-2009 10380070@unknown@formal@none@1@S@A quote from 1387: "Five books come down from heaven for information of mankind."@@@@1@14@@danf@17-8-2009 10380080@unknown@formal@none@1@S@It was also used for an ''item'' of training, ''e.g.'' a particular instruction.@@@@1@13@@danf@17-8-2009 10380090@unknown@formal@none@1@S@"Melibee had heard the great skills and reasons of Dame Prudence, and her wise information and techniques."@@@@1@17@@danf@17-8-2009 10380100@unknown@formal@none@1@S@(1386)@@@@1@1@@danf@17-8-2009 10380110@unknown@formal@none@1@S@The English word was apparently derived by adding the common "noun of action" ending "''-ation''" (descended through Francais from Latin "''-tio''") to the earlier verb ''to inform'', in the sense of to give form to the mind, to discipline, instruct, teach: "Men so wise should go and inform their kings."@@@@1@50@@danf@17-8-2009 10380120@unknown@formal@none@1@S@(1330) ''Inform'' itself comes (via French) from the Latin verb ''informare'', to give form to, to form an idea of.@@@@1@20@@danf@17-8-2009 10380125@unknown@formal@none@1@S@Furthermore, Latin itself already even contained the word ''informatio'' meaning concept or idea, but the extent to which this may have influenced the development of the word ''information'' in English is unclear.@@@@1@32@@danf@17-8-2009 10380130@unknown@formal@none@1@S@As a final note, the ancient Greek word for ''form'' was [eidos], and this word was famously used in a technical philosophical sense by [Plato] (and later Aristotle) to denote the ideal identity or essence of something (see [Theory of forms]).@@@@1@41@@danf@17-8-2009 10380140@unknown@formal@none@1@S@"Eidos" can also be associated with [thought], [proposition] or even [concept].@@@@1@11@@danf@17-8-2009 10380150@unknown@formal@none@1@S@== Information as a message ==@@@@1@6@@danf@17-8-2009 10380160@unknown@formal@none@1@S@'''Information''' is the state of a system of interest.@@@@1@9@@danf@17-8-2009 10380170@unknown@formal@none@1@S@Message is the information materialized.@@@@1@5@@danf@17-8-2009 10380180@unknown@formal@none@1@S@Information is a quality of a [[message]] from a [[sender]] to one or more receivers.@@@@1@15@@danf@17-8-2009 10380190@unknown@formal@none@1@S@Information is always ''about'' something (size of a parameter, occurrence of an event, etc).@@@@1@14@@danf@17-8-2009 10380200@unknown@formal@none@1@S@Viewed in this manner, information does not have to be accurate.@@@@1@11@@danf@17-8-2009 10380210@unknown@formal@none@1@S@It may be a truth or a lie, or just the sound of a falling tree.@@@@1@16@@danf@17-8-2009 10380220@unknown@formal@none@1@S@Even a disruptive noise used to inhibit the flow of communication and create misunderstanding would in this view be a form of information.@@@@1@23@@danf@17-8-2009 10380230@unknown@formal@none@1@S@However, generally speaking, if the ''amount'' of information in the received message increases, the message is more accurate.@@@@1@18@@danf@17-8-2009 10380240@unknown@formal@none@1@S@This model assumes there is a definite [[sender]] and at least one receiver.@@@@1@13@@danf@17-8-2009 10380250@unknown@formal@none@1@S@Many refinements of the model assume the existence of a common language understood by the sender and at least one of the receivers.@@@@1@23@@danf@17-8-2009 10380260@unknown@formal@none@1@S@An important variation identifies information as that which would be communicated by a message if it were sent from a sender to a receiver capable of understanding the message.@@@@1@29@@danf@17-8-2009 10380270@unknown@formal@none@1@S@Notably, it is not required that the sender be capable of understanding the message, or even cognizant that there is a message.@@@@1@22@@danf@17-8-2009 10380280@unknown@formal@none@1@S@Thus, information is something that can be extracted from an environment, e.g., through observation, reading or measurement.@@@@1@17@@danf@17-8-2009 10380290@unknown@formal@none@1@S@Information is a term with many meanings depending on context, but is as a rule closely related to such concepts as meaning, knowledge, instruction, communication, representation, and mental stimulus.@@@@1@29@@danf@17-8-2009 10380300@unknown@formal@none@1@S@Simply stated, information is a message received and understood.@@@@1@9@@danf@17-8-2009 10380310@unknown@formal@none@1@S@In terms of data, it can be defined as a collection of facts from which conclusions may be drawn.@@@@1@19@@danf@17-8-2009 10380320@unknown@formal@none@1@S@There are many other aspects of information since it is the knowledge acquired through study or experience or instruction.@@@@1@19@@danf@17-8-2009 10380330@unknown@formal@none@1@S@But overall, information is the result of processing, manipulating and organizing data in a way that adds to the knowledge of the person receiving it.@@@@1@25@@danf@17-8-2009 10380340@unknown@formal@none@1@S@[[Communication theory]] provides a numerical measure of the uncertainty of an outcome.@@@@1@12@@danf@17-8-2009 10380350@unknown@formal@none@1@S@For example, we can say that "the signal contained thousands of bits of information".@@@@1@14@@danf@17-8-2009 10380360@unknown@formal@none@1@S@Communication theory tends to use the concept of [[information entropy]], generally attributed to [[C.E. Shannon]] (see below).@@@@1@17@@danf@17-8-2009 10380370@unknown@formal@none@1@S@Another form of information is [[Fisher information]], a concept of [[R.A. Fisher]].@@@@1@12@@danf@17-8-2009 10380380@unknown@formal@none@1@S@This is used in application of statistics to [[estimation theory]] and to science in general.@@@@1@15@@danf@17-8-2009 10380390@unknown@formal@none@1@S@Fisher information is thought of as the amount of information that a message carries about an unobservable parameter.@@@@1@18@@danf@17-8-2009 10380400@unknown@formal@none@1@S@It can be computed from knowledge of the [[likelihood function]] defining the system.@@@@1@13@@danf@17-8-2009 10380410@unknown@formal@none@1@S@For example, with a normal likelihood function, the Fisher information is the reciprocal of the variance of the law.@@@@1@19@@danf@17-8-2009 10380420@unknown@formal@none@1@S@In the absence of knowledge of the likelihood law, the Fisher information may be computed from normally distributed score data as the reciprocal of their second moment.@@@@1@27@@danf@17-8-2009 10380430@unknown@formal@none@1@S@Even though information and data are often used interchangeably, they are actually very different.@@@@1@14@@danf@17-8-2009 10380440@unknown@formal@none@1@S@Data is a set of unrelated information, and as such is of no use until it is properly evaluated.@@@@1@19@@danf@17-8-2009 10380450@unknown@formal@none@1@S@Upon evaluation, once there is some significant relation between data, and they show some relevance, then they are converted into information.@@@@1@21@@danf@17-8-2009 10380460@unknown@formal@none@1@S@Now this same data can be used for different purposes.@@@@1@10@@danf@17-8-2009 10380470@unknown@formal@none@1@S@Thus, till the data convey some information, they are not useful.@@@@1@11@@danf@17-8-2009 10380480@unknown@formal@none@1@S@=== Measuring information entropy ===@@@@1@5@@danf@17-8-2009 10380490@unknown@formal@none@1@S@The view of information as a message came into prominence with the publication in 1948 of an influential paper by [[Claude Shannon]], "[[A Mathematical Theory of Communication]]."@@@@1@27@@danf@17-8-2009 10380500@unknown@formal@none@1@S@This paper provides the foundations of [[information theory]] and endows the word ''information'' not only with a technical meaning but also a measure.@@@@1@23@@danf@17-8-2009 10380510@unknown@formal@none@1@S@If the sending device is equally likely to send any one of a set of N messages, then the preferred measure of "the information produced when one message is chosen from the set" is the base two [[logarithm]] of N (This measure is called ''[[self-information]]'').@@@@1@45@@danf@17-8-2009 10380520@unknown@formal@none@1@S@In this paper, Shannon continues:@@@@1@5@@danf@17-8-2009 10380530@unknown@formal@none@1@S@A complementary way of measuring information is provided by [[algorithmic information theory]].@@@@1@12@@danf@17-8-2009 10380540@unknown@formal@none@1@S@In brief, this measures the information content of a list of symbols based on how predictable they are, or more specifically how easy it is to compute the list through a [[computer program|program]]: the information content of a sequence is the number of bits of the shortest program that computes it.@@@@1@51@@danf@17-8-2009 10380550@unknown@formal@none@1@S@The sequence below would have a very low algorithmic information measurement since it is a very predictable pattern, and as the pattern continues the measurement would not change.@@@@1@28@@danf@17-8-2009 10380560@unknown@formal@none@1@S@Shannon information would give the same information measurement for each symbol, since they are [[statistical randomness|statistically random]], and each new symbol would increase the measurement.@@@@1@25@@danf@17-8-2009 10380570@unknown@formal@none@1@S@:123456789101112131415161718192021@@@@1@1@@danf@17-8-2009 10380580@unknown@formal@none@1@S@It is important to recognize the limitations of traditional information theory and algorithmic information theory from the perspective of human meaning.@@@@1@21@@danf@17-8-2009 10380590@unknown@formal@none@1@S@For example, when referring to the meaning content of a message Shannon noted “Frequently the messages have ''meaning…'' these semantic aspects of communication are irrelevant to the engineering problem.@@@@1@29@@danf@17-8-2009 10380600@unknown@formal@none@1@S@The significant aspect is that the actual message is one selected ''from a set of possible messages''” (emphasis in original).@@@@1@20@@danf@17-8-2009 10380610@unknown@formal@none@1@S@In information theory signals are part of a process, not a substance; they do something, they do not contain any specific meaning.@@@@1@22@@danf@17-8-2009 10380620@unknown@formal@none@1@S@Combining algorithmic information theory and information theory we can conclude that the most random signal contains the most information as it can be interpreted in any way and cannot be compressed.@@@@1@31@@danf@17-8-2009 10380630@unknown@formal@none@1@S@Michael Reddy noted that "'signals' of the [[mathematical theory]] are 'patterns that can be exchanged'.@@@@1@15@@danf@17-8-2009 10380640@unknown@formal@none@1@S@There is no message contained in the signal, the signals convey the ability to select from a set of possible messages."@@@@1@21@@danf@17-8-2009 10380650@unknown@formal@none@1@S@In information theory "the system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design".@@@@1@32@@danf@17-8-2009 10380660@unknown@formal@none@1@S@== Information as a pattern ==@@@@1@6@@danf@17-8-2009 10380670@unknown@formal@none@1@S@Information is any represented [[pattern]].@@@@1@5@@danf@17-8-2009 10380680@unknown@formal@none@1@S@This view assumes neither accuracy nor directly communicating parties, but instead assumes a separation between an object and its representation.@@@@1@20@@danf@17-8-2009 10380690@unknown@formal@none@1@S@Consider the following example: [[economic statistics]] represent an [[Economics|economy]], however inaccurately.@@@@1@11@@danf@17-8-2009 10380700@unknown@formal@none@1@S@What are commonly referred to as data in [[computing]], [[statistics]], and other fields, are forms of information in this sense.@@@@1@20@@danf@17-8-2009 10380710@unknown@formal@none@1@S@The [[electromagnetism|electro-magnetic]] patterns in a [[computer network]] and connected [[peripheral device|device]]s are related to something other than the pattern itself, such as [[Character (computing)|text characters]] to be displayed and [[Computer keyboard|keyboard]] input.@@@@1@32@@danf@17-8-2009 10380720@unknown@formal@none@1@S@[[Signal (information theory)|Signal]]s, [[Sign (linguistics)|sign]]s, and [[symbol]]s are also in this category.@@@@1@12@@danf@17-8-2009 10380730@unknown@formal@none@1@S@On the other hand, according to [[semiotics]], data is symbols with certain syntax and information is data with a certain semantic.@@@@1@21@@danf@17-8-2009 10380740@unknown@formal@none@1@S@[[Painting]] and [[drawing]] contain information to the extent that they represent something such as an assortment of objects on a table, a [[profile]], or a [[landscape]].@@@@1@26@@danf@17-8-2009 10380750@unknown@formal@none@1@S@In other words, when a pattern of something is transposed to a pattern of something else, the latter is information.@@@@1@20@@danf@17-8-2009 10380760@unknown@formal@none@1@S@This would be the case whether or not there was anyone to perceive it.@@@@1@14@@danf@17-8-2009 10380770@unknown@formal@none@1@S@But if information can be defined merely as a pattern, does that mean that neither [[utility]] nor meaning are necessary components of information?@@@@1@23@@danf@17-8-2009 10380780@unknown@formal@none@1@S@Arguably a distinction must be made between raw unprocessed data and information which possesses utility, [[value (economics)|value]] or some quantum of meaning.@@@@1@22@@danf@17-8-2009 10380790@unknown@formal@none@1@S@On this view, information may indeed be characterized as a pattern; but this is a [[necessary]] condition, not a [[sufficient]] one.@@@@1@21@@danf@17-8-2009 10380800@unknown@formal@none@1@S@An individual entry in a telephone book, which follows a specific pattern formed by name, address and telephone number, does not become "informative" in some sense unless and until it possesses some degree of utility, value or meaning.@@@@1@38@@danf@17-8-2009 10380810@unknown@formal@none@1@S@For example, someone might look up a girlfriend's number, might order a take away etc.@@@@1@15@@danf@17-8-2009 10380820@unknown@formal@none@1@S@The vast majority of numbers will never be construed as "information" in any meaningful sense.@@@@1@15@@danf@17-8-2009 10380830@unknown@formal@none@1@S@The gap between data and information is only closed by a behavioral bridge whereby some value, utility or meaning is added to transform mere data or pattern into information.@@@@1@29@@danf@17-8-2009 10380840@unknown@formal@none@1@S@When one constructs a representation of an object, one can selectively extract from the object ([[sampling (case studies)|sampling]]) or use a [[system]] of signs to replace ([[encode|encoding]]), or both.@@@@1@29@@danf@17-8-2009 10380850@unknown@formal@none@1@S@The sampling and encoding result in representation.@@@@1@7@@danf@17-8-2009 10380860@unknown@formal@none@1@S@An example of the former is a "sample" of a product; an example of the latter is "verbal description" of a product.@@@@1@22@@danf@17-8-2009 10380870@unknown@formal@none@1@S@Both contain information of the product, however inaccurate.@@@@1@8@@danf@17-8-2009 10380880@unknown@formal@none@1@S@When one interprets representation, one can predict a broader pattern from a limited number of observations (inference) or understand the relation between patterns of two different things ([[decode|decoding]]).@@@@1@28@@danf@17-8-2009 10380890@unknown@formal@none@1@S@One example of the former is to sip a [[soup]] to know if it is spoiled; an example of the latter is examining footprints to determine the animal and its condition.@@@@1@31@@danf@17-8-2009 10380900@unknown@formal@none@1@S@In both cases, information sources are not constructed or presented by some "sender" of information.@@@@1@15@@danf@17-8-2009 10380910@unknown@formal@none@1@S@Regardless, information is dependent upon, but usually unrelated to and separate from, the medium or media used to express it.@@@@1@20@@danf@17-8-2009 10380920@unknown@formal@none@1@S@In other words, the position of a theoretical series of bits, or even the output once interpreted by a [[computer]] or similar device, is unimportant, except when someone or something is present to interpret the information.@@@@1@36@@danf@17-8-2009 10380930@unknown@formal@none@1@S@Therefore, a quantity of information is totally distinct from its medium.@@@@1@11@@danf@17-8-2009 10380940@unknown@formal@none@1@S@== Information as sensory input ==@@@@1@6@@danf@17-8-2009 10380950@unknown@formal@none@1@S@Often information is viewed as a type of [[input]] to an [[organism]] or designed device.@@@@1@15@@danf@17-8-2009 10380960@unknown@formal@none@1@S@Inputs are of two kinds.@@@@1@5@@danf@17-8-2009 10380970@unknown@formal@none@1@S@Some inputs are important to the function of the organism (for example, food) or device ([[energy]]) by themselves.@@@@1@18@@danf@17-8-2009 10380980@unknown@formal@none@1@S@In his book ''Sensory Ecology,'' Dusenbery called these causal inputs.@@@@1@10@@danf@17-8-2009 10380990@unknown@formal@none@1@S@Other inputs (information) are important only because they are associated with causal inputs and can be used to predict the occurrence of a causal input at a later time (and perhaps another place).@@@@1@33@@danf@17-8-2009 10381000@unknown@formal@none@1@S@Some information is important because of association with other information but eventually there must be a connection to a causal input.@@@@1@21@@danf@17-8-2009 10381010@unknown@formal@none@1@S@In practice, information is usually carried by weak stimuli that must be detected by specialized sensory systems and amplified by energy inputs before they can be functional to the organism or device.@@@@1@32@@danf@17-8-2009 10381020@unknown@formal@none@1@S@For example, light is often a causal input to plants but provides information to animals.@@@@1@15@@danf@17-8-2009 10381030@unknown@formal@none@1@S@The colored light reflected from a flower is too weak to do much photosynthetic work but the visual system of the bee detects it and the bee's nervous system uses the information to guide the bee to the flower, where the bee often finds nectar or pollen, which are causal inputs, serving a nutritional function.@@@@1@55@@danf@17-8-2009 10381040@unknown@formal@none@1@S@Information is any type of sensory input.@@@@1@7@@danf@17-8-2009 10381050@unknown@formal@none@1@S@When an organism with a [[nervous system]] receives an input, it transforms the input into an electrical signal.@@@@1@18@@danf@17-8-2009 10381060@unknown@formal@none@1@S@This is regarded information by some.@@@@1@6@@danf@17-8-2009 10381070@unknown@formal@none@1@S@The idea of representation is still relevant, but in a slightly different manner.@@@@1@13@@danf@17-8-2009 10381080@unknown@formal@none@1@S@That is, while [[abstract painting]] does not represent anything concretely, when the viewer sees the painting, it is nevertheless transformed into electrical signals that create a representation of the painting.@@@@1@30@@danf@17-8-2009 10381090@unknown@formal@none@1@S@Defined this way, information does not have to be related to truth, communication, or representation of an object.@@@@1@18@@danf@17-8-2009 10381100@unknown@formal@none@1@S@[[Entertainment]] in general is not intended to be informative.@@@@1@9@@danf@17-8-2009 10381110@unknown@formal@none@1@S@[[Music]], the [[performing arts]], [[amusement park]]s, works of [[fiction]] and so on are thus forms of information in this sense, but they are not necessarily forms of information according to some definitions given above.@@@@1@34@@danf@17-8-2009 10381120@unknown@formal@none@1@S@Consider another example: food supplies both nutrition and taste for those who eat it.@@@@1@14@@danf@17-8-2009 10381130@unknown@formal@none@1@S@If information is equated to sensory input, then nutrition is not information but taste is.@@@@1@15@@danf@17-8-2009 10381140@unknown@formal@none@1@S@== Information as an influence which leads to a transformation ==@@@@1@11@@danf@17-8-2009 10381150@unknown@formal@none@1@S@Information is any type of pattern that influences the formation or transformation of other patterns.@@@@1@15@@danf@17-8-2009 10381160@unknown@formal@none@1@S@In this sense, there is no need for a conscious mind to perceive, much less appreciate, the pattern.@@@@1@18@@danf@17-8-2009 10381170@unknown@formal@none@1@S@Consider, for example, [[DNA]].@@@@1@4@@danf@17-8-2009 10381180@unknown@formal@none@1@S@The sequence of [[nucleotide]]s is a pattern that influences the formation and development of an organism without any need for a conscious mind.@@@@1@23@@danf@17-8-2009 10381190@unknown@formal@none@1@S@[[Systems theory]] at times seems to refer to information in this sense, assuming information does not necessarily involve any conscious mind, and patterns circulating (due to [[feedback]]) in the system can be called information.@@@@1@34@@danf@17-8-2009 10381200@unknown@formal@none@1@S@In other words, it can be said that information in this sense is something potentially perceived as representation, though not created or presented for that purpose.@@@@1@26@@danf@17-8-2009 10381210@unknown@formal@none@1@S@When [[Marshall McLuhan]] speaks of [[media (communication)|media]] and their effects on human cultures, he refers to the structure of [[cultural artifact|artifacts]] that in turn shape our behaviors and mindsets.@@@@1@29@@danf@17-8-2009 10381220@unknown@formal@none@1@S@Also, [[pheromone]]s are often said to be "information" in this sense.@@@@1@11@@danf@17-8-2009 10381230@unknown@formal@none@1@S@(See also [[Gregory Bateson]].)@@@@1@4@@danf@17-8-2009 10381240@unknown@formal@none@1@S@== Information as a property in physics ==@@@@1@8@@danf@17-8-2009 10381250@unknown@formal@none@1@S@In 2003, J. D. Bekenstein claimed there is a growing trend in [[physics]] to define the physical world as being made of information itself (and thus information is defined in this way).@@@@1@32@@danf@17-8-2009 10381260@unknown@formal@none@1@S@Information has a well defined meaning in physics.@@@@1@8@@danf@17-8-2009 10381270@unknown@formal@none@1@S@Examples of this include the phenomenon of [[quantum entanglement]] where particles can interact without reference to their separation or the speed of light.@@@@1@23@@danf@17-8-2009 10381280@unknown@formal@none@1@S@Information itself cannot travel faster than light even if the information is transmitted indirectly.@@@@1@14@@danf@17-8-2009 10381290@unknown@formal@none@1@S@This could lead to the fact that all attempts at physically observing a particle with an "entangled" relationship to another are slowed down, even though the particles are not connected in any other way other than by the information they carry.@@@@1@41@@danf@17-8-2009 10381300@unknown@formal@none@1@S@Another link is demonstrated by the [[Maxwell's demon]] thought experiment.@@@@1@10@@danf@17-8-2009 10381310@unknown@formal@none@1@S@In this experiment, a direct relationship between information and another physical property, [[entropy]], is demonstrated.@@@@1@15@@danf@17-8-2009 10381320@unknown@formal@none@1@S@A consequence is that it is impossible to destroy information without increasing the entropy of a system; in practical terms this often means generating heat.@@@@1@25@@danf@17-8-2009 10381330@unknown@formal@none@1@S@Another, more philosophical, outcome is that information could be thought of as interchangeable with [[Energy#Transformations_of_energy|energy]].@@@@1@15@@danf@17-8-2009 10381340@unknown@formal@none@1@S@Thus, in the study of [[logic gates]], the theoretical lower bound of thermal energy released by an ''AND gate'' is higher than for the ''NOT gate'' (because information is destroyed in an ''AND gate'' and simply converted in a ''NOT gate'').@@@@1@41@@danf@17-8-2009 10381350@unknown@formal@none@1@S@Physical information is of particular importance in the theory of [[quantum computers]].@@@@1@12@@danf@17-8-2009 10381360@unknown@formal@none@1@S@== Information as records ==@@@@1@5@@danf@17-8-2009 10381370@unknown@formal@none@1@S@Records are a specialized form of information.@@@@1@7@@danf@17-8-2009 10381380@unknown@formal@none@1@S@Essentially, records are information produced consciously or as by-products of business activities or transactions and retained because of their value.@@@@1@20@@danf@17-8-2009 10381390@unknown@formal@none@1@S@Primarily their value is as evidence of the activities of the organization but they may also be retained for their informational value.@@@@1@22@@danf@17-8-2009 10381400@unknown@formal@none@1@S@Sound [[records management]] ensures that the integrity of records is preserved for as long as they are required.@@@@1@18@@danf@17-8-2009 10381410@unknown@formal@none@1@S@The international standard on records management, ISO 15489, defines records as "information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business".@@@@1@36@@danf@17-8-2009 10381420@unknown@formal@none@1@S@The International Committee on Archives (ICA) Committee on electronic records defined a record as, "a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity".@@@@1@49@@danf@17-8-2009 10381430@unknown@formal@none@1@S@Records may be retained because of their business value, as part of the [[corporate memory]] of the organization or to meet legal, fiscal or accountability requirements imposed on the organization.@@@@1@30@@danf@17-8-2009 10381440@unknown@formal@none@1@S@Willis (2005) expressed the view that sound management of business records and information delivered "…six key requirements for good [[corporate governance]]…transparency; accountability; due process; compliance; meeting statutory and common law requirements; and security of personal and corporate information."@@@@1@38@@danf@17-8-2009 10381450@unknown@formal@none@1@S@== Information and semiotics ==@@@@1@5@@danf@17-8-2009 10381460@unknown@formal@none@1@S@Beynon-Davies explains the multi-faceted concept of information in terms of that of signs and sign-systems.@@@@1@15@@danf@17-8-2009 10381470@unknown@formal@none@1@S@Signs themselves can be considered in terms of four inter-dependent levels, layers or branches of [[semiotics]]: pragmatics, semantics, syntactics and empirics.@@@@1@21@@danf@17-8-2009 10381480@unknown@formal@none@1@S@These four layers serve to connect the social world on the one hand with the physical or technical world on the other.@@@@1@22@@danf@17-8-2009 10381490@unknown@formal@none@1@S@[[Pragmatics]] is concerned with the purpose of communication.@@@@1@8@@danf@17-8-2009 10381500@unknown@formal@none@1@S@Pragmatics links the issue of signs with that of intention.@@@@1@10@@danf@17-8-2009 10381510@unknown@formal@none@1@S@The focus of pragmatics is on the intentions of human agents underlying communicative behaviour.@@@@1@14@@danf@17-8-2009 10381520@unknown@formal@none@1@S@In other words, intentions link language to action.@@@@1@8@@danf@17-8-2009 10381530@unknown@formal@none@1@S@[[Semantics]] is concerned with the meaning of a message conveyed in a communicative act.@@@@1@14@@danf@17-8-2009 10381535@unknown@formal@none@1@S@Semantics considers the content of communication.@@@@1@6@@danf@17-8-2009 10381540@unknown@formal@none@1@S@Semantics is the study of the meaning of signs - the association between signs and behaviour.@@@@1@16@@danf@17-8-2009 10381550@unknown@formal@none@1@S@Semantics can be considered as the study of the link between symbols and their referents or concepts; particularly the way in which signs relate to human behaviour.@@@@1@27@@danf@17-8-2009 10381560@unknown@formal@none@1@S@Syntactics is concerned with the formalism used to represent a message.@@@@1@11@@danf@17-8-2009 10381570@unknown@formal@none@1@S@Syntactics as an area studies the form of communication in terms of the logic and grammar of sign systems.@@@@1@19@@danf@17-8-2009 10381580@unknown@formal@none@1@S@Syntactics is devoted to the study of the form rather than the content of signs and sign-systems.@@@@1@17@@danf@17-8-2009 10381590@unknown@formal@none@1@S@Empirics is the study of the signals used to carry a message; the physical characteristics of the medium of communication.@@@@1@20@@danf@17-8-2009 10381600@unknown@formal@none@1@S@Empirics is devoted to the study of communication channels and their characteristics, e.g., sound, light, electronic transmission etc.@@@@1@18@@danf@17-8-2009 10381610@unknown@formal@none@1@S@Communication normally exists within the context of some social situation.@@@@1@10@@danf@17-8-2009 10381620@unknown@formal@none@1@S@The social situation sets the context for the intentions conveyed (pragmatics) and the form in which communication takes place.@@@@1@19@@danf@17-8-2009 10381630@unknown@formal@none@1@S@In a communicative situation intentions are expressed through messages which comprise collections of inter-related signs taken from a language which is mutually understood by the agents involved in the communication.@@@@1@30@@danf@17-8-2009 10381640@unknown@formal@none@1@S@Mutual understanding implies that agents involved understand the chosen language in terms of its agreed syntax (syntactics) and semantics.@@@@1@19@@danf@17-8-2009 10381650@unknown@formal@none@1@S@The sender codes the message in the language and sends the message as signals along some communication channel (empirics).@@@@1@19@@danf@17-8-2009 10381660@unknown@formal@none@1@S@The chosen communication channel will have inherent properties which determine outcomes such as the speed with which communication can take place and over what distance.@@@@1@25@@danf@17-8-2009 10390010@unknown@formal@none@1@S@
Information extraction
@@@@1@2@@danf@17-8-2009 10390020@unknown@formal@none@1@S@In [[natural language processing]], '''information extraction''' (IE) is a type of [[information retrieval]] whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured [[machine-readable]] documents.@@@@1@37@@danf@17-8-2009 10390030@unknown@formal@none@1@S@An example of information extraction is the extraction of instances of corporate mergers, more formally MergerBetween(company_1, company_2, date), from an online news sentence such as: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."@@@@1@36@@danf@17-8-2009 10390040@unknown@formal@none@1@S@A broad goal of IE is to allow computation to be done on the previously unstructured data.@@@@1@17@@danf@17-8-2009 10390050@unknown@formal@none@1@S@A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data.@@@@1@21@@danf@17-8-2009 10390060@unknown@formal@none@1@S@The significance of IE is determined by the growing amount of information available in unstructured (i.e. without [[metadata]]) form, for instance on the Internet.@@@@1@24@@danf@17-8-2009 10390070@unknown@formal@none@1@S@This knowledge can be made more accessible by means of transformation into [[relational database|relational form]], or by marking-up with [[XML]] tags.@@@@1@21@@danf@17-8-2009 10390080@unknown@formal@none@1@S@An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with.@@@@1@21@@danf@17-8-2009 10390090@unknown@formal@none@1@S@A typical application of IE is to scan a set of documents written in a [[natural language]] and populate a database with the information extracted.@@@@1@25@@danf@17-8-2009 10390100@unknown@formal@none@1@S@Current approaches to IE use [[natural language processing]] techniques that focus on very restricted domains.@@@@1@15@@danf@17-8-2009 10390110@unknown@formal@none@1@S@For example, the ''[[Message Understanding Conference]]'' (MUC) is a competition-based conference that focused on the following domains in the past:@@@@1@20@@danf@17-8-2009 10390120@unknown@formal@none@1@S@*MUC-1 (1987), MUC-2 (1989): Naval operations messages.@@@@1@7@@danf@17-8-2009 10390130@unknown@formal@none@1@S@*MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.@@@@1@9@@danf@17-8-2009 10390140@unknown@formal@none@1@S@*MUC-5 (1993): Joint ventures and microelectronics domain.@@@@1@7@@danf@17-8-2009 10390150@unknown@formal@none@1@S@*MUC-6 (1995): News articles on management changes.@@@@1@7@@danf@17-8-2009 10390160@unknown@formal@none@1@S@*MUC-7 (1998): Satellite launch reports.@@@@1@5@@danf@17-8-2009 10390170@unknown@formal@none@1@S@Natural Language texts may need to use some form of a [[Text simplification]] to create a more easily machine readable text to extract the sentences.@@@@1@25@@danf@17-8-2009 10390180@unknown@formal@none@1@S@Typical subtasks of IE are:@@@@1@5@@danf@17-8-2009 10390190@unknown@formal@none@1@S@* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.@@@@1@22@@danf@17-8-2009 10390200@unknown@formal@none@1@S@* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object.@@@@1@13@@danf@17-8-2009 10390210@unknown@formal@none@1@S@For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference.@@@@1@9@@danf@17-8-2009 10390220@unknown@formal@none@1@S@* [[Terminology extraction]]: finding the relevant terms for a given [[text corpus|corpus]]@@@@1@12@@danf@17-8-2009 10390230@unknown@formal@none@1@S@* Relation Extraction: identification of relations between entities, such as:@@@@1@10@@danf@17-8-2009 10390240@unknown@formal@none@1@S@**PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.")@@@@1@12@@danf@17-8-2009 10390250@unknown@formal@none@1@S@**PERSON located in LOCATION (extracted from the sentence "Bill is in France.")@@@@1@12@@danf@17-8-2009 10400010@unknown@formal@none@1@S@
Information retrieval
@@@@1@2@@danf@17-8-2009 10400020@unknown@formal@none@1@S@'''Information retrieval''' ('''IR''') is the science of searching for documents, for [[information]] within documents and for [[Metadata (computing)|metadata]] about documents, as well as that of searching [[relational database]]s and the [[World Wide Web]].@@@@1@33@@danf@17-8-2009 10400030@unknown@formal@none@1@S@There is overlap in the usage of the terms data retrieval, [[document retrieval]], information retrieval, and [[text retrieval]], but each also has its own body of literature, theory, [[Praxis (process)|praxis]] and technologies.@@@@1@32@@danf@17-8-2009 10400040@unknown@formal@none@1@S@IR is [[interdisciplinary]], based on [[computer science]], [[mathematics]], [[library science]], [[information science]], [[information architecture]], [[cognitive psychology]], [[linguistics]], [[statistics]] and [[physics]].@@@@1@20@@danf@17-8-2009 10400050@unknown@formal@none@1@S@Automated information retrieval systems are used to reduce what has been called "[[information overload]]".@@@@1@14@@danf@17-8-2009 10400060@unknown@formal@none@1@S@Many universities and [[public library|public libraries]] use IR systems to provide access to books, journals and other documents.@@@@1@18@@danf@17-8-2009 10400070@unknown@formal@none@1@S@Web [[Web search engine|search engine]]s are the most visible [[Information retrieval applications|IR applications]].@@@@1@13@@danf@17-8-2009 10400080@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10400090@unknown@formal@none@1@S@The idea of using computers to search for relevant pieces of information was popularized in an article ''[[As We May Think]]'' by [[Vannevar Bush]] in 1945.@@@@1@26@@danf@17-8-2009 10400100@unknown@formal@none@1@S@First implementations of information retrieval systems were introduced in the 1950s and 1960s.@@@@1@13@@danf@17-8-2009 10400110@unknown@formal@none@1@S@By 1990 several different techniques had been shown to perform well on small text corpora (several thousand documents).@@@@1@18@@danf@17-8-2009 10400120@unknown@formal@none@1@S@In 1992 the US Department of Defense, along with the [[National Institute of Standards and Technology]] (NIST), cosponsored the [[Text Retrieval Conference]] (TREC) as part of the TIPSTER text program.@@@@1@30@@danf@17-8-2009 10400130@unknown@formal@none@1@S@The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection.@@@@1@31@@danf@17-8-2009 10400140@unknown@formal@none@1@S@This catalyzed research on methods that [[scalability|scale]] to huge corpora.@@@@1@10@@danf@17-8-2009 10400150@unknown@formal@none@1@S@The introduction of web [[Web search engine|search engine]]s has boosted the need for very large scale retrieval systems even further.@@@@1@20@@danf@17-8-2009 10400160@unknown@formal@none@1@S@The use of digital methods for storing and retrieving information has led to the phenomenon of [[digital obsolescence]], where a digital resource ceases to be readable because the physical media, the reader required to read the media, the hardware, or the software that runs on it, is no longer available.@@@@1@50@@danf@17-8-2009 10400170@unknown@formal@none@1@S@The information is initially easier to retrieve than if it were on paper, but is then effectively lost.@@@@1@18@@danf@17-8-2009 10400180@unknown@formal@none@1@S@=== Timeline ===@@@@1@3@@danf@17-8-2009 10400190@unknown@formal@none@1@S@* 1890: Hollerith tabulating machines were used to analyze the US census.@@@@1@12@@danf@17-8-2009 10400200@unknown@formal@none@1@S@([[Herman Hollerith]]).@@@@1@2@@danf@17-8-2009 10400210@unknown@formal@none@1@S@* 1945: [[Vannevar Bush]]'s ''[[As We May Think]]'' appeared in ''[[Atlantic Monthly]]''@@@@1@12@@danf@17-8-2009 10400220@unknown@formal@none@1@S@* Late 1940s: The US military confronted problems of indexing and retrieval of wartime scientific research documents captured from Germans.@@@@1@20@@danf@17-8-2009 10400230@unknown@formal@none@1@S@* 1947: [[Hans Peter Luhn]] (research engineer at IBM since 1941) began work on a mechanized, punch card based system for searching chemical compounds.@@@@1@24@@danf@17-8-2009 10400240@unknown@formal@none@1@S@* 1950: The term "information retrieval" may have been coined by [[Calvin Mooers]].@@@@1@13@@danf@17-8-2009 10400250@unknown@formal@none@1@S@* 1950s: Growing concern in the US for a "science gap" with the USSR motivated, encouraged funding, and provided a backdrop for mechanized literature searching systems ([[Allen Kent]] et al) and the invention of citation indexing ([[Eugene Garfield]]).@@@@1@38@@danf@17-8-2009 10400260@unknown@formal@none@1@S@* 1955: Allen Kent joined [[Case Western Reserve University]], and eventually becomes associate director of the Center for Documentation and Communications Research.@@@@1@22@@danf@17-8-2009 10400270@unknown@formal@none@1@S@That same year, Kent and colleagues publish a paper in American Documentation describing the precision and recall measures, as well as detailing a proposed "framework" for evaluating an IR system, which includes statistical sampling methods for determining the number of relevant documents not retrieved.@@@@1@44@@danf@17-8-2009 10400280@unknown@formal@none@1@S@* 1958: International Conference on Scientific Information Washington DC included consideration of IR systems as a solution to problems identified.@@@@1@20@@danf@17-8-2009 10400290@unknown@formal@none@1@S@See: Proceedings of the International Conference on Scientific Information, 1958 (National Academy of Sciences, Washington, DC, 1959)@@@@1@17@@danf@17-8-2009 10400300@unknown@formal@none@1@S@* 1959: Hans Peter Luhn published "Auto-encoding of documents for information retrieval."@@@@1@12@@danf@17-8-2009 10400310@unknown@formal@none@1@S@* 1960: Melvin Earl (Bill) Maron and J. L. Kuhns published "On relevance, probabilistic indexing, and information retrieval" in Journal of the ACM 7(3):216-244, July 1960.@@@@1@26@@danf@17-8-2009 10400320@unknown@formal@none@1@S@* Early 1960s: [[Gerard Salton]] began work on IR at Harvard, later moved to Cornell.@@@@1@15@@danf@17-8-2009 10400330@unknown@formal@none@1@S@* 1962: [[Cyril W. Cleverdon]] published early findings of the Cranfield studies, developing a model for IR system evaluation.@@@@1@19@@danf@17-8-2009 10400340@unknown@formal@none@1@S@See: Cyril W. Cleverdon, "Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing Systems".@@@@1@20@@danf@17-8-2009 10400350@unknown@formal@none@1@S@Cranfield Coll. of Aeronautics, Cranfield, England, 1962.@@@@1@7@@danf@17-8-2009 10400360@unknown@formal@none@1@S@* 1962: Kent published Information Analysis and Retrieval@@@@1@8@@danf@17-8-2009 10400370@unknown@formal@none@1@S@* 1963: Weinberg report "Science, Government and Information" gave a full articulation of the idea of a "crisis of scientific information."@@@@1@21@@danf@17-8-2009 10400380@unknown@formal@none@1@S@The report was named after Dr. [[Alvin Weinberg]].@@@@1@8@@danf@17-8-2009 10400390@unknown@formal@none@1@S@* 1963: [[Joseph Becker]] and [[Robert M. Hayes]] published text on information retrieval.@@@@1@13@@danf@17-8-2009 10400400@unknown@formal@none@1@S@Becker, Joseph; Hayes, Robert Mayo.@@@@1@5@@danf@17-8-2009 10400410@unknown@formal@none@1@S@Information storage and retrieval: tools, elements, theories.@@@@1@7@@danf@17-8-2009 10400420@unknown@formal@none@1@S@New York, Wiley (1963).@@@@1@4@@danf@17-8-2009 10400430@unknown@formal@none@1@S@* 1964: [[Karen Spärck Jones]] finished her thesis at Cambridge, ''Synonymy and Semantic Classification'', and continued work on [[computational linguistics]] as it applies to IR@@@@1@25@@danf@17-8-2009 10400440@unknown@formal@none@1@S@* 1964: The [[National Bureau of Standards]] sponsored a symposium titled "Statistical Association Methods for Mechanized Documentation."@@@@1@17@@danf@17-8-2009 10400450@unknown@formal@none@1@S@Several highly significant papers, including G. Salton's first published reference (we believe) to the SMART system.@@@@1@16@@danf@17-8-2009 10400460@unknown@formal@none@1@S@* Mid-1960s: National Library of Medicine developed [[MEDLARS]] Medical Literature Analysis and Retrieval System, the first major machine-readable database and batch retrieval system@@@@1@23@@danf@17-8-2009 10400470@unknown@formal@none@1@S@* Mid-1960s: Project Intrex at MIT@@@@1@6@@danf@17-8-2009 10400480@unknown@formal@none@1@S@* 1965: [[J. C. R. Licklider]] published ''Libraries of the Future''@@@@1@11@@danf@17-8-2009 10400490@unknown@formal@none@1@S@* 1966: [[Don Swanson]] was involved in studies at University of Chicago on Requirements for Future Catalogs@@@@1@17@@danf@17-8-2009 10400500@unknown@formal@none@1@S@* 1968: Gerard Salton published ''Automatic Information Organization and Retrieval''.@@@@1@10@@danf@17-8-2009 10400510@unknown@formal@none@1@S@* 1968: [[J. W. Sammon]]'s RADC Tech report "Some Mathematics of Information Storage and Retrieval..." outlined the vector model.@@@@1@19@@danf@17-8-2009 10400520@unknown@formal@none@1@S@* 1969: Sammon's "A nonlinear mapping for data structure analysis" (IEEE Transactions on Computers) was the first proposal for visualization interface to an IR system.@@@@1@25@@danf@17-8-2009 10400530@unknown@formal@none@1@S@* Late 1960s: [[F. W. Lancaster]] completed evaluation studies of the MEDLARS system and published the first edition of his text on information retrieval@@@@1@24@@danf@17-8-2009 10400540@unknown@formal@none@1@S@* Early 1970s: first online systems--NLM's AIM-TWX, MEDLINE; Lockheed's Dialog; SDC's ORBIT@@@@1@12@@danf@17-8-2009 10400550@unknown@formal@none@1@S@* Early 1970s: [[Theodor Nelson]] promoting concept of [[hypertext]], published Computer Lib/Dream Machines@@@@1@13@@danf@17-8-2009 10400560@unknown@formal@none@1@S@* 1971: [[N. Jardine]] and [[C. J. Van Rijsbergen]] published "The use of hierarchic clustering in information retrieval", which articulated the "cluster hypothesis."@@@@1@23@@danf@17-8-2009 10400570@unknown@formal@none@1@S@(Information Storage and Retrieval, 7(5), pp. 217-240, Dec 1971)@@@@1@9@@danf@17-8-2009 10400580@unknown@formal@none@1@S@*1975: Three highly influential publications by Salton fully articulated his vector processing framework and term discrimination model:@@@@1@17@@danf@17-8-2009 10400590@unknown@formal@none@1@S@** A Theory of Indexing (Society for Industrial and Applied Mathematics)@@@@1@11@@danf@17-8-2009 10400600@unknown@formal@none@1@S@** "A theory of term importance in automatic text analysis", (JASIS v. 26)@@@@1@13@@danf@17-8-2009 10400610@unknown@formal@none@1@S@** "A vector space model for automatic indexing", (CACM 18:11)@@@@1@10@@danf@17-8-2009 10400620@unknown@formal@none@1@S@* 1978: The First [[Association for Computing Machinery|ACM]] [[SIGIR]] conference.@@@@1@10@@danf@17-8-2009 10400630@unknown@formal@none@1@S@* 1979: C. J. Van Rijsbergen published ''Information Retrieval'' (Butterworths).@@@@1@10@@danf@17-8-2009 10400640@unknown@formal@none@1@S@Heavy emphasis on probabilistic models.@@@@1@5@@danf@17-8-2009 10400650@unknown@formal@none@1@S@* 1980: First international ACM SIGIR conference, joint with British Computer Society IR group in Cambridge@@@@1@16@@danf@17-8-2009 10400660@unknown@formal@none@1@S@* 1982: [[Nicholas J. Belkin|Belkin]], Oddy, and Brooks proposed the ASK (Anomalous State of Knowledge) viewpoint for information retrieval.@@@@1@19@@danf@17-8-2009 10400670@unknown@formal@none@1@S@This was an important concept, though their automated analysis tool proved ultimately disappointing.@@@@1@13@@danf@17-8-2009 10400680@unknown@formal@none@1@S@* 1983: Salton (and M. McGill) published Introduction to Modern Information Retrieval (McGraw-Hill), with heavy emphasis on vector models.@@@@1@19@@danf@17-8-2009 10400690@unknown@formal@none@1@S@* Mid-1980s: Efforts to develop end user versions of commercial IR systems.@@@@1@12@@danf@17-8-2009 10400700@unknown@formal@none@1@S@* 1985-1993: Key papers on and experimental systems for visualization interfaces.@@@@1@11@@danf@17-8-2009 10400710@unknown@formal@none@1@S@* Work by [[D. B. Crouch]], [[Robert R. Korfhage]], [[M. Chalmers]], [[A. Spoerri]] and others.@@@@1@15@@danf@17-8-2009 10400720@unknown@formal@none@1@S@* 1989: First [[World Wide Web]] proposals by [[Tim Berners-Lee]] at [[CERN]].@@@@1@12@@danf@17-8-2009 10400730@unknown@formal@none@1@S@* 1992: First TREC conference.@@@@1@5@@danf@17-8-2009 10400740@unknown@formal@none@1@S@* 1997: Publication of [[Robert R. Korfhage|Korfhage]]'s ''Information Storage and Retrieval'' with emphasis on visualization and multi-reference point systems.@@@@1@19@@danf@17-8-2009 10400750@unknown@formal@none@1@S@* Late 1990s: Web [[Web search engine|search engine]] implementation of many features formerly found only in experimental IR systems@@@@1@19@@danf@17-8-2009 10400760@unknown@formal@none@1@S@== Overview ==@@@@1@3@@danf@17-8-2009 10400770@unknown@formal@none@1@S@An information retrieval process begins when a user enters a query into the system.@@@@1@14@@danf@17-8-2009 10400780@unknown@formal@none@1@S@Queries are formal statements of [[information need]]s, for example search strings in web search engines.@@@@1@15@@danf@17-8-2009 10400790@unknown@formal@none@1@S@In information retrieval a query does not uniquely identify a single object in the collection.@@@@1@15@@danf@17-8-2009 10400800@unknown@formal@none@1@S@Instead, several objects may match the query, perhaps with different degrees of [[relevance|relevancy]].@@@@1@13@@danf@17-8-2009 10400810@unknown@formal@none@1@S@An object is an entity which keeps or stores information in a database.@@@@1@13@@danf@17-8-2009 10400820@unknown@formal@none@1@S@User queries are matched to objects stored in the database.@@@@1@10@@danf@17-8-2009 10400830@unknown@formal@none@1@S@Depending on the [[Information retrieval applications|application]] the data objects may be, for example, text documents, images or videos.@@@@1@18@@danf@17-8-2009 10400840@unknown@formal@none@1@S@Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.@@@@1@24@@danf@17-8-2009 10400850@unknown@formal@none@1@S@Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value.@@@@1@26@@danf@17-8-2009 10400860@unknown@formal@none@1@S@The top ranking objects are then shown to the user.@@@@1@10@@danf@17-8-2009 10400870@unknown@formal@none@1@S@The process may then be iterated if the user wishes to refine the query.@@@@1@14@@danf@17-8-2009 10400880@unknown@formal@none@1@S@== Performance measures ==@@@@1@4@@danf@17-8-2009 10400890@unknown@formal@none@1@S@Many different measures for evaluating the performance of information retrieval systems have been proposed.@@@@1@14@@danf@17-8-2009 10400900@unknown@formal@none@1@S@The measures require a collection of documents and a query.@@@@1@10@@danf@17-8-2009 10400910@unknown@formal@none@1@S@All common measures described here assume a ground truth notion of relevancy: every document is known to be either relevant or non-relevant to a particular query.@@@@1@26@@danf@17-8-2009 10400920@unknown@formal@none@1@S@In practice queries may be [[ill-posed]] and there may be different shades of relevancy.@@@@1@14@@danf@17-8-2009 10400930@unknown@formal@none@1@S@=== Precision ===@@@@1@3@@danf@17-8-2009 10400940@unknown@formal@none@1@S@Precision is the fraction of the documents retrieved that are [[Relevance (information retrieval)|relevant]] to the user's information need.@@@@1@18@@danf@17-8-2009 10400950@unknown@formal@none@1@S@: \\mbox{precision}=\\frac{|\\{\\mbox{relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{retrieved documents}\\}|} @@@@1@6@@danf@17-8-2009 10400960@unknown@formal@none@1@S@In [[binary classification]], precision is analogous to [[positive predictive value]].@@@@1@10@@danf@17-8-2009 10400970@unknown@formal@none@1@S@Precision takes all retrieved documents into account.@@@@1@7@@danf@17-8-2009 10400980@unknown@formal@none@1@S@It can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system.@@@@1@19@@danf@17-8-2009 10400990@unknown@formal@none@1@S@This measure is called ''precision at n'' or ''P\sn''.@@@@1@9@@danf@17-8-2009 10401000@unknown@formal@none@1@S@Note that the meaning and usage of "precision" in the field of Information Retrieval differs from the definition of [[accuracy and precision]] within other branches of science and technology.@@@@1@29@@danf@17-8-2009 10401010@unknown@formal@none@1@S@=== Recall ===@@@@1@3@@danf@17-8-2009 10401020@unknown@formal@none@1@S@Recall is the fraction of the documents that are relevant to the query that are successfully retrieved.@@@@1@17@@danf@17-8-2009 10401030@unknown@formal@none@1@S@:\\mbox{recall}=\\frac{|\\{\\mbox{relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{relevant documents}\\}|} @@@@1@5@@danf@17-8-2009 10401040@unknown@formal@none@1@S@In binary classification, recall is called [[sensitivity (tests)|sensitivity]].@@@@1@8@@danf@17-8-2009 10401050@unknown@formal@none@1@S@So it can be looked at as ''the probability that a relevant document is retrieved by the query''.@@@@1@18@@danf@17-8-2009 10401060@unknown@formal@none@1@S@It is trivial to achieve recall of 100% by returning all documents in response to any query.@@@@1@17@@danf@17-8-2009 10401070@unknown@formal@none@1@S@Therefore recall alone is not enough but one needs to measure the number of non-relevant documents also, for example by computing the precision.@@@@1@23@@danf@17-8-2009 10401080@unknown@formal@none@1@S@=== Fall-Out ===@@@@1@3@@danf@17-8-2009 10401090@unknown@formal@none@1@S@The proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:@@@@1@14@@danf@17-8-2009 10401100@unknown@formal@none@1@S@: \\mbox{fall-out}=\\frac{|\\{\\mbox{non-relevant documents}\\}\\cap\\{\\mbox{retrieved documents}\\}|}{|\\{\\mbox{non-relevant documents}\\}|} @@@@1@6@@danf@17-8-2009 10401110@unknown@formal@none@1@S@In binary classification, fall-out is closely related to [[specificity (tests)|specificity]].@@@@1@10@@danf@17-8-2009 10401120@unknown@formal@none@1@S@More precisely: \\mbox{fall-out}=1-\\mbox{specificity}.@@@@1@3@@danf@17-8-2009 10401130@unknown@formal@none@1@S@It can be looked at as ''the probability that a non-relevant document is retrieved by the query''.@@@@1@17@@danf@17-8-2009 10401140@unknown@formal@none@1@S@It is trivial to achieve fall-out of 0% by returning zero documents in response to any query.@@@@1@17@@danf@17-8-2009 10401150@unknown@formal@none@1@S@=== F-measure ===@@@@1@3@@danf@17-8-2009 10401160@unknown@formal@none@1@S@The weighted [[harmonic mean]] of precision and recall, the traditional F-measure or balanced F-score is:@@@@1@15@@danf@17-8-2009 10401170@unknown@formal@none@1@S@:F = 2 \\cdot (\\mathrm{precision} \\cdot \\mathrm{recall}) / (\\mathrm{precision} + \\mathrm{recall}).\\,@@@@1@11@@danf@17-8-2009 10401180@unknown@formal@none@1@S@This is also known as the F_1 measure, because recall and precision are evenly weighted.@@@@1@15@@danf@17-8-2009 10401190@unknown@formal@none@1@S@The general formula for non-negative real ß is:@@@@1@8@@danf@17-8-2009 10401200@unknown@formal@none@1@S@:F_\\beta = (1 + \\beta^2) \\cdot (\\mathrm{precision} \\cdot \\mathrm{recall}) / (\\beta^2 \\cdot \\mathrm{precision} + \\mathrm{recall}).\\,@@@@1@15@@danf@17-8-2009 10401210@unknown@formal@none@1@S@Two other commonly used F measures are the F_{2} measure, which weights recall twice as much as precision, and the F_{0.5} measure, which weights precision twice as much as recall.@@@@1@30@@danf@17-8-2009 10401220@unknown@formal@none@1@S@The F-measure was derived by van Rijsbergen (1979) so that F_\\beta "measures the effectiveness of retrieval with respect to a user who attaches ß times as much importance to recall as precision".@@@@1@32@@danf@17-8-2009 10401230@unknown@formal@none@1@S@It is based on van Rijsbergen's effectiveness measure E = 1-(1/(\\alpha/P + (1-\\alpha)/R)).@@@@1@13@@danf@17-8-2009 10401240@unknown@formal@none@1@S@Their relationship is F_\\beta = 1 - E where \\alpha=1/(\\beta^2+1).@@@@1@10@@danf@17-8-2009 10401250@unknown@formal@none@1@S@=== Average precision of precision and recall===@@@@1@7@@danf@17-8-2009 10401260@unknown@formal@none@1@S@The precision and recall are based on the whole list of documents returned by the system.@@@@1@16@@danf@17-8-2009 10401270@unknown@formal@none@1@S@Average precision emphasizes returning more relevant documents earlier.@@@@1@8@@danf@17-8-2009 10401280@unknown@formal@none@1@S@It is average of precisions computed after truncating the list after each of the relevant documents in turn:@@@@1@18@@danf@17-8-2009 10401290@unknown@formal@none@1@S@: \\operatorname{AveP} = \\frac{\\sum_{r=1}^N (P(r) \\times \\mathrm{rel}(r))}{\\mbox{number of relevant documents}} \\!@@@@1@11@@danf@17-8-2009 10401300@unknown@formal@none@1@S@where ''r'' is the rank, ''N'' the number retrieved, ''rel()'' a binary function on the relevance of a given rank, and ''P()'' precision at a given cut-off rank.@@@@1@28@@danf@17-8-2009 10401310@unknown@formal@none@1@S@== Model types ==@@@@1@4@@danf@17-8-2009 10401320@unknown@formal@none@1@S@[[Image:Information-Retrieval-Models.png|thumb|500px|categorization of IR-models (translated from [http://de.wikipedia.org/wiki/Informationsrückgewinnung#Klassifikation_von_Modellen_zur_Repr.C3.A4sentation_nat.C3.BCrlichsprachlicher_Dokumente German entry], original source [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id= Dominik Kuropka])]]@@@@1@13@@danf@17-8-2009 10401325@unknown@formal@none@1@S@For the information retrieval to be efficient, the documents are typically transformed into a suitable representation.@@@@1@16@@danf@17-8-2009 10401330@unknown@formal@none@1@S@There are several representations.@@@@1@4@@danf@17-8-2009 10401340@unknown@formal@none@1@S@The picture on the right illustrates the relationship of some common models.@@@@1@12@@danf@17-8-2009 10401350@unknown@formal@none@1@S@In the picture, the models are categorized according to two dimensions: the mathematical basis and the properties of the model.@@@@1@20@@danf@17-8-2009 10401360@unknown@formal@none@1@S@=== First dimension: mathematical basis ===@@@@1@6@@danf@17-8-2009 10401370@unknown@formal@none@1@S@* ''Set-theoretic models'' represent documents as sets of words or phrases.@@@@1@11@@danf@17-8-2009 10401380@unknown@formal@none@1@S@Similarities are usually derived from set-theoretic operations on those sets.@@@@1@10@@danf@17-8-2009 10401390@unknown@formal@none@1@S@Common models are:@@@@1@3@@danf@17-8-2009 10401400@unknown@formal@none@1@S@** [[Standard Boolean model]]@@@@1@4@@danf@17-8-2009 10401410@unknown@formal@none@1@S@** [[Extended Boolean model]]@@@@1@4@@danf@17-8-2009 10401420@unknown@formal@none@1@S@** [[Fuzzy retrieval]]@@@@1@3@@danf@17-8-2009 10401430@unknown@formal@none@1@S@* ''Algebraic models'' represent documents and queries usually as vectors, matrices or tuples.@@@@1@13@@danf@17-8-2009 10401440@unknown@formal@none@1@S@The similarity of the query vector and document vector is represented as a scalar value.@@@@1@15@@danf@17-8-2009 10401450@unknown@formal@none@1@S@** [[Vector space model]]@@@@1@4@@danf@17-8-2009 10401460@unknown@formal@none@1@S@** [[Generalized vector space model]]@@@@1@5@@danf@17-8-2009 10401470@unknown@formal@none@1@S@** Topic-based vector space model (literature: [http://www.kuropka.net/files/TVSM.pdf], [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id=])@@@@1@8@@danf@17-8-2009 10401480@unknown@formal@none@1@S@** [[Extended Boolean model]]@@@@1@4@@danf@17-8-2009 10401490@unknown@formal@none@1@S@** Enhanced topic-based vector space model (literature: [http://kuropka.net/files/HPI_Evaluation_of_eTVSM.pdf], [http://www.logos-verlag.de/cgi-bin/engbuchmid?isbn=0514&lng=eng&id=])@@@@1@9@@danf@17-8-2009 10401500@unknown@formal@none@1@S@** Latent semantic indexing aka [[latent semantic analysis]]@@@@1@8@@danf@17-8-2009 10401510@unknown@formal@none@1@S@* ''Probabilistic models'' treat the process of document retrieval as a probabilistic inference.@@@@1@13@@danf@17-8-2009 10401520@unknown@formal@none@1@S@Similarities are computed as probabilities that a document is relevant for a given query.@@@@1@14@@danf@17-8-2009 10401530@unknown@formal@none@1@S@Probabilistic theorems like the [[Bayes' theorem]] are often used in these models.@@@@1@12@@danf@17-8-2009 10401540@unknown@formal@none@1@S@** [[Binary independence retrieval]]@@@@1@4@@danf@17-8-2009 10401550@unknown@formal@none@1@S@** [[Probabilistic relevance model (BM25)]]@@@@1@5@@danf@17-8-2009 10401560@unknown@formal@none@1@S@** Uncertain inference@@@@1@3@@danf@17-8-2009 10401570@unknown@formal@none@1@S@** [[Language model]]s@@@@1@3@@danf@17-8-2009 10401580@unknown@formal@none@1@S@** [[Divergence-from-randomness model]]@@@@1@3@@danf@17-8-2009 10401590@unknown@formal@none@1@S@** [[Latent Dirichlet allocation]]@@@@1@4@@danf@17-8-2009 10401600@unknown@formal@none@1@S@=== Second dimension: properties of the model ===@@@@1@8@@danf@17-8-2009 10401610@unknown@formal@none@1@S@* ''Models without term-interdependencies'' treat different terms/words as independent.@@@@1@9@@danf@17-8-2009 10401620@unknown@formal@none@1@S@This fact is usually represented in vector space models by the [[orthogonality]] assumption of term vectors or in probabilistic models by an [[independency]] assumption for term variables.@@@@1@27@@danf@17-8-2009 10401630@unknown@formal@none@1@S@* ''Models with immanent term interdependencies'' allow a representation of interdependencies between terms.@@@@1@13@@danf@17-8-2009 10401640@unknown@formal@none@1@S@However the degree of the interdependency between two terms is defined by the model itself.@@@@1@15@@danf@17-8-2009 10401650@unknown@formal@none@1@S@It is usually directly or indirectly derived (e.g. by [[dimension reduction|dimensional reduction]]) from the [[co-occurrence]] of those terms in the whole set of documents.@@@@1@24@@danf@17-8-2009 10401660@unknown@formal@none@1@S@* ''Models with transcendent term interdependencies'' allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined.@@@@1@26@@danf@17-8-2009 10401670@unknown@formal@none@1@S@They relay an external source for the degree of interdependency between two terms.@@@@1@13@@danf@17-8-2009 10401680@unknown@formal@none@1@S@(For example a human or sophisticated algorithms.)@@@@1@7@@danf@17-8-2009 10401690@unknown@formal@none@1@S@== Major figures ==@@@@1@4@@danf@17-8-2009 10401700@unknown@formal@none@1@S@* [[Gerard Salton]]@@@@1@3@@danf@17-8-2009 10401710@unknown@formal@none@1@S@* [[Hans Peter Luhn]]@@@@1@4@@danf@17-8-2009 10401720@unknown@formal@none@1@S@* [http://ciir.cs.umass.edu/personnel/croft.html W. Bruce Croft]@@@@1@5@@danf@17-8-2009 10401730@unknown@formal@none@1@S@* [[Karen Spärck Jones]]@@@@1@4@@danf@17-8-2009 10401740@unknown@formal@none@1@S@* [[C. J. van Rijsbergen]]@@@@1@5@@danf@17-8-2009 10401750@unknown@formal@none@1@S@* [http://www.soi.city.ac.uk/~ser/homepage.html Stephen E. Robertson]@@@@1@5@@danf@17-8-2009 10401760@unknown@formal@none@1@S@== Awards in the field ==@@@@1@6@@danf@17-8-2009 10401770@unknown@formal@none@1@S@* [[Tony Kent Strix award]]@@@@1@5@@danf@17-8-2009 10401780@unknown@formal@none@1@S@* [[Gerard Salton Award]]@@@@1@4@@danf@17-8-2009 10410010@unknown@formal@none@1@S@
Information theory
@@@@1@2@@danf@17-8-2009 10410020@unknown@formal@none@1@S@'''Information theory''' is a branch of [[applied mathematics]] and [[electrical engineering]] involving the quantification of [[information]].@@@@1@16@@danf@17-8-2009 10410030@unknown@formal@none@1@S@Historically, information theory was developed to find fundamental limits on compressing and reliably [[communication|communicating]] data.@@@@1@15@@danf@17-8-2009 10410040@unknown@formal@none@1@S@Since its inception it has broadened to find applications in many other areas, including [[statistical inference]], [[natural language processing]], [[cryptography]] generally, [[networks]] other than communication networks -- as in [[neurobiology]], the evolution and function of molecular codes, model selection in ecology, thermal physics, [[quantum computing]], plagiarism detection and other forms of [[data analysis]].@@@@1@53@@danf@17-8-2009 10410050@unknown@formal@none@1@S@A key measure of information in the theory is known as [[information entropy]], which is usually expressed by the average number of bits needed for storage or communication.@@@@1@28@@danf@17-8-2009 10410060@unknown@formal@none@1@S@Intuitively, entropy quantifies the uncertainty involved when encountering a [[random variable]].@@@@1@11@@danf@17-8-2009 10410070@unknown@formal@none@1@S@For example, a fair coin flip (2 equally likely outcomes) will have less entropy than a roll of a die (6 equally likely outcomes).@@@@1@24@@danf@17-8-2009 10410080@unknown@formal@none@1@S@Applications of fundamental topics of information theory include [[lossless data compression]] (e.g. [[ZIP (file format)|ZIP files]]), [[lossy data compression]] (e.g. [[MP3]]s), and [[channel capacity|channel coding]] (e.g. for [[DSL]] lines).@@@@1@29@@danf@17-8-2009 10410110@unknown@formal@none@1@S@The field is at the intersection of [[mathematics]], [[statistics]], [[computer science]], [[physics]], [[neurobiology]], and [[electrical engineering]].@@@@1@16@@danf@17-8-2009 10410120@unknown@formal@none@1@S@Its impact has been crucial to the success of the [[Voyager program|Voyager]] missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the [[Internet]], the study of [[linguistics]] and of human perception, the understanding of [[black hole]]s, and numerous other fields.@@@@1@48@@danf@17-8-2009 10410130@unknown@formal@none@1@S@Important sub-fields of information theory are source coding, channel coding, algorithmic complexity theory, algorithmic information theory, and measures of information.@@@@1@20@@danf@17-8-2009 10410140@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10410150@unknown@formal@none@1@S@The main concepts of information theory can be grasped by considering the most widespread means of human communication: language.@@@@1@19@@danf@17-8-2009 10410160@unknown@formal@none@1@S@Two important aspects of a good language are as follows: First, the most common words (e.g., "a", "the", "I") should be shorter than less common words (e.g., "benefit", "generation", "mediocre"), so that sentences will not be too long.@@@@1@38@@danf@17-8-2009 10410170@unknown@formal@none@1@S@Such a tradeoff in word length is analogous to [[data compression]] and is the essential aspect of [[source coding]].@@@@1@19@@danf@17-8-2009 10410180@unknown@formal@none@1@S@Second, if part of a sentence is unheard or misheard due to noise -— e.g., a passing car -— the listener should still be able to glean the meaning of the underlying message.@@@@1@33@@danf@17-8-2009 10410190@unknown@formal@none@1@S@Such robustness is as essential for an electronic communication system as it is for a language; properly building such robustness into communications is done by [[Channel capacity|channel coding]].@@@@1@28@@danf@17-8-2009 10410200@unknown@formal@none@1@S@Source coding and channel coding are the fundamental concerns of information theory.@@@@1@12@@danf@17-8-2009 10410210@unknown@formal@none@1@S@Note that these concerns have nothing to do with the ''importance'' of messages.@@@@1@13@@danf@17-8-2009 10410220@unknown@formal@none@1@S@For example, a platitude such as "Thank you; come again" takes about as long to say or write as the urgent plea, "Call an ambulance!" while clearly the latter is more important and more meaningful.@@@@1@35@@danf@17-8-2009 10410230@unknown@formal@none@1@S@Information theory, however, does not consider message importance or meaning, as these are matters of the quality of data rather than the quantity and readability of data, the latter of which is determined solely by probabilities.@@@@1@36@@danf@17-8-2009 10410240@unknown@formal@none@1@S@Information theory is generally considered to have been founded in 1948 by [[Claude Elwood Shannon|Claude Shannon]] in his seminal work, "[[A Mathematical Theory of Communication]]."@@@@1@25@@danf@17-8-2009 10410250@unknown@formal@none@1@S@The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel.@@@@1@20@@danf@17-8-2009 10410260@unknown@formal@none@1@S@The most fundamental results of this theory are Shannon's [[source coding theorem]], which establishes that, on average, the number of ''bits'' needed to represent the result of an uncertain event is given by its [[information entropy|entropy]]; and Shannon's [[noisy-channel coding theorem]], which states that ''reliable'' communication is possible over ''noisy'' channels provided that the rate of communication is below a certain threshold called the channel capacity.@@@@1@66@@danf@17-8-2009 10410270@unknown@formal@none@1@S@The channel capacity can be approached in practice by using appropriate encoding and decoding systems.@@@@1@15@@danf@17-8-2009 10410280@unknown@formal@none@1@S@Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: [[adaptive system]]s, [[anticipatory system]]s, [[artificial intelligence]], [[complex system]]s, [[complexity science]], [[cybernetics]], [[informatics]], [[machine learning]], along with [[systems science]]s of many descriptions.@@@@1@58@@danf@17-8-2009 10410290@unknown@formal@none@1@S@Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of [[coding theory]].@@@@1@24@@danf@17-8-2009 10410300@unknown@formal@none@1@S@Coding theory is concerned with finding explicit methods, called ''codes'', of increasing the efficiency and reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the maximum possible for that channel.@@@@1@41@@danf@17-8-2009 10410310@unknown@formal@none@1@S@These codes can be roughly subdivided into [[data compression]] (source coding) and [[error-correction]] (channel coding) techniques.@@@@1@16@@danf@17-8-2009 10410320@unknown@formal@none@1@S@In the latter case, it took many years to find the methods Shannon's work proved were possible.@@@@1@17@@danf@17-8-2009 10410330@unknown@formal@none@1@S@A third class of information theory codes are cryptographic algorithms (both [[code (cryptography)|code]]s and [[cipher]]s).@@@@1@15@@danf@17-8-2009 10410340@unknown@formal@none@1@S@Concepts, methods and results from coding theory and information theory are widely used in [[cryptography]] and [[cryptanalysis]].@@@@1@17@@danf@17-8-2009 10410350@unknown@formal@none@1@S@''See the article [[ban (information)]] for a historical application.''@@@@1@9@@danf@17-8-2009 10410360@unknown@formal@none@1@S@Information theory is also used in [[information retrieval]], [[intelligence (information gathering)|intelligence gathering]], [[gambling]], [[statistics]], and even in [[musical composition]].@@@@1@19@@danf@17-8-2009 10410370@unknown@formal@none@1@S@==Historical background==@@@@1@2@@danf@17-8-2009 10410380@unknown@formal@none@1@S@The landmark event that established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of [[Claude E. Shannon]]'s classic paper "[[A Mathematical Theory of Communication]]" in the ''[[Bell System Technical Journal]]'' in July and October of 1948.@@@@1@43@@danf@17-8-2009 10410390@unknown@formal@none@1@S@Prior to this paper, limited information theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability.@@@@1@21@@danf@17-8-2009 10410400@unknown@formal@none@1@S@[[Harry Nyquist]]'s 1924 paper, ''Certain Factors Affecting Telegraph Speed,'' contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation W = K \\log m, where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant.@@@@1@66@@danf@17-8-2009 10410410@unknown@formal@none@1@S@[[Ralph Hartley]]'s 1928 paper, ''Transmission of Information,'' uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish that one sequence of symbols from any other, thus quantifying information as H = \\log S^n = n \\log S, where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission.@@@@1@58@@danf@17-8-2009 10410420@unknown@formal@none@1@S@The natural unit of information was therefore the decimal digit, much later renamed the [[ban (information)|hartley]] in his honour as a unit or scale or measure of information.@@@@1@28@@danf@17-8-2009 10410430@unknown@formal@none@1@S@[[Alan Turing]] in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war [[Cryptanalysis of the Enigma|Enigma]] ciphers.@@@@1@27@@danf@17-8-2009 10410440@unknown@formal@none@1@S@Much of the mathematics behind information theory with events of different probabilities was developed for the field of [[thermodynamics]] by [[Ludwig Boltzmann]] and [[J. Willard Gibbs]].@@@@1@26@@danf@17-8-2009 10410450@unknown@formal@none@1@S@Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by [[Rolf Landauer]] in the 1960s, are explored in ''[[Entropy in thermodynamics and information theory]]''.@@@@1@26@@danf@17-8-2009 10410460@unknown@formal@none@1@S@In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that@@@@1@47@@danf@17-8-2009 10410470@unknown@formal@none@1@S@:"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."@@@@1@22@@danf@17-8-2009 10410480@unknown@formal@none@1@S@With it came the ideas of@@@@1@6@@danf@17-8-2009 10410490@unknown@formal@none@1@S@* the [[information entropy]] and [[redundancy (information theory)|redundancy]] of a source, and its relevance through the [[source coding theorem]];@@@@1@19@@danf@17-8-2009 10410500@unknown@formal@none@1@S@* the [[mutual information]], and the [[channel capacity]] of a noisy channel, including the promise of perfect loss-free communication given by the [[noisy-channel coding theorem]];@@@@1@25@@danf@17-8-2009 10410510@unknown@formal@none@1@S@* the practical result of the [[Shannon–Hartley law]] for the channel capacity of a Gaussian channel; and of course@@@@1@19@@danf@17-8-2009 10410520@unknown@formal@none@1@S@* the [[bit]]—a new way of seeing the most fundamental unit of information@@@@1@13@@danf@17-8-2009 10410530@unknown@formal@none@1@S@==Ways of measuring information==@@@@1@4@@danf@17-8-2009 10410540@unknown@formal@none@1@S@Information theory is based on [[probability theory]] and [[statistics]].@@@@1@9@@danf@17-8-2009 10410550@unknown@formal@none@1@S@The most important quantities of information are [[Information entropy|entropy]], the information in a [[random variable]], and [[mutual information]], the amount of information in common between two random variables.@@@@1@28@@danf@17-8-2009 10410560@unknown@formal@none@1@S@The former quantity indicates how easily message data can be [[data compression|compressed]] while the latter can be used to find the communication rate across a [[Channel (communications)|channel]].@@@@1@27@@danf@17-8-2009 10410570@unknown@formal@none@1@S@The choice of logarithmic base in the following formulae determines the [[units of measurement|unit]] of [[information entropy]] that is used.@@@@1@20@@danf@17-8-2009 10410580@unknown@formal@none@1@S@The most common unit of information is the [[bit]], based on the [[binary logarithm]].@@@@1@14@@danf@17-8-2009 10410590@unknown@formal@none@1@S@Other units include the [[nat (information)|nat]], which is based on the [[natural logarithm]], and the [[deciban|hartley]], which is based on the [[common logarithm]].@@@@1@23@@danf@17-8-2009 10410600@unknown@formal@none@1@S@In what follows, an expression of the form p \\log p \\, is considered by convention to be equal to zero whenever p=0.@@@@1@23@@danf@17-8-2009 10410605@unknown@formal@none@1@S@This is justified because \\lim_{p \\rightarrow 0+} p \\log p = 0 for any logarithmic base.@@@@1@16@@danf@17-8-2009 10410610@unknown@formal@none@1@S@===Entropy===@@@@1@1@@danf@17-8-2009 10410620@unknown@formal@none@1@S@The '''[[information entropy|entropy]]''', H, of a discrete random variable X is a measure of the amount of ''uncertainty'' associated with the value of X.@@@@1@24@@danf@17-8-2009 10410630@unknown@formal@none@1@S@Suppose one transmits 1000 bits (0s and 1s).@@@@1@8@@danf@17-8-2009 10410640@unknown@formal@none@1@S@If these bits are known ahead of transmission (to be a certain value with absolute probability), logic dictates that no information has been transmitted.@@@@1@24@@danf@17-8-2009 10410650@unknown@formal@none@1@S@If, however, each is equally and independently likely to be 0 or 1, 1000 bits (in the information theoretic sense) have been transmitted.@@@@1@23@@danf@17-8-2009 10410660@unknown@formal@none@1@S@Between these two extremes, information can be quantified as follows.@@@@1@10@@danf@17-8-2009 10410670@unknown@formal@none@1@S@If \\mathbb{X}\\, is the set of all messages x that X could be, and p(x) is the probability of X given x, then the entropy of X is defined:@@@@1@29@@danf@17-8-2009 10410680@unknown@formal@none@1@S@: H(X) = \\mathbb{E}_{X} [I(x)] = -\\sum_{x \\in \\mathbb{X}} p(x) \\log p(x).@@@@1@12@@danf@17-8-2009 10410690@unknown@formal@none@1@S@(Here, I(x) is the [[self-information]], which is the entropy contribution of an individual message.)@@@@1@14@@danf@17-8-2009 10410700@unknown@formal@none@1@S@An important property of entropy is that it is maximized when all the messages in the message space are equiprobable—i.e., most unpredictable—in which case H(X) = \\log |\\mathbb{X}|.@@@@1@28@@danf@17-8-2009 10410710@unknown@formal@none@1@S@The special case of information entropy for a random variable with two outcomes is the '''[[binary entropy function]]''':@@@@1@18@@danf@17-8-2009 10410720@unknown@formal@none@1@S@:H_\\mbox{b}(p) = - p \\log p - (1-p)\\log (1-p).\\,@@@@1@9@@danf@17-8-2009 10410730@unknown@formal@none@1@S@===Joint entropy===@@@@1@2@@danf@17-8-2009 10410740@unknown@formal@none@1@S@The '''[[joint entropy]]''' of two discrete random variables X and Y is merely the entropy of their pairing: (X, Y).@@@@1@20@@danf@17-8-2009 10410750@unknown@formal@none@1@S@This implies that if X and Y are [[statistical independence|independent]], then their joint entropy is the sum of their individual entropies.@@@@1@21@@danf@17-8-2009 10410760@unknown@formal@none@1@S@For example, if (X,Y) represents the position of a [[chess]] piece — X the row and Y the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.@@@@1@45@@danf@17-8-2009 10410770@unknown@formal@none@1@S@:H(X, Y) = \\mathbb{E}_{X,Y} [-\\log p(x,y)] = - \\sum_{x, y} p(x, y) \\log p(x, y) \\,@@@@1@16@@danf@17-8-2009 10410780@unknown@formal@none@1@S@Despite similar notation, joint entropy should not be confused with '''[[cross entropy]]'''.@@@@1@12@@danf@17-8-2009 10410790@unknown@formal@none@1@S@===Conditional entropy (equivocation)===@@@@1@3@@danf@17-8-2009 10410800@unknown@formal@none@1@S@The '''[[conditional entropy]]''' or '''conditional uncertainty''' of X given random variable Y (also called the '''equivocation''' of X about Y) is the average conditional entropy over Y:@@@@1@27@@danf@17-8-2009 10410810@unknown@formal@none@1@S@: H(X|Y) = \\mathbb E_Y [H(X|y)] = -\\sum_{y \\in Y} p(y) \\sum_{x \\in X} p(x|y) \\log p(x|y) = -\\sum_{x,y} p(x,y) \\log \\frac{p(x,y)}{p(y)}.@@@@1@22@@danf@17-8-2009 10410820@unknown@formal@none@1@S@Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use.@@@@1@40@@danf@17-8-2009 10410830@unknown@formal@none@1@S@A basic property of this form of conditional entropy is that:@@@@1@11@@danf@17-8-2009 10410840@unknown@formal@none@1@S@: H(X|Y) = H(X,Y) - H(Y) .\\,@@@@1@8@@danf@17-8-2009 10410850@unknown@formal@none@1@S@===Mutual information (transinformation)===@@@@1@3@@danf@17-8-2009 10410860@unknown@formal@none@1@S@'''[[Mutual information]]''' measures the amount of information that can be obtained about one random variable by observing another.@@@@1@18@@danf@17-8-2009 10410870@unknown@formal@none@1@S@It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.@@@@1@22@@danf@17-8-2009 10410880@unknown@formal@none@1@S@The mutual information of X relative to Y is given by:@@@@1@11@@danf@17-8-2009 10410890@unknown@formal@none@1@S@:I(X;Y) = \\mathbb{E}_{X,Y} [SI(x,y)] = \\sum_{x,y} p(x,y) \\log \\frac{p(x,y)}{p(x)\\, p(y)}@@@@1@10@@danf@17-8-2009 10410900@unknown@formal@none@1@S@where SI (''S''pecific mutual ''I''nformation) is the [[pointwise mutual information]].@@@@1@10@@danf@17-8-2009 10410910@unknown@formal@none@1@S@A basic property of the mutual information is that@@@@1@9@@danf@17-8-2009 10410920@unknown@formal@none@1@S@: I(X;Y) = H(X) - H(X|Y).\\,@@@@1@6@@danf@17-8-2009 10410930@unknown@formal@none@1@S@That is, knowing ''Y'', we can save an average of I(X; Y) bits in encoding ''X'' compared to not knowing ''Y''.@@@@1@21@@danf@17-8-2009 10410940@unknown@formal@none@1@S@Mutual information is [[symmetric function|symmetric]]:@@@@1@5@@danf@17-8-2009 10410950@unknown@formal@none@1@S@: I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\\,@@@@1@10@@danf@17-8-2009 10410960@unknown@formal@none@1@S@Mutual information can be expressed as the average [[Kullback–Leibler divergence]] (information gain) of the [[posterior probability|posterior probability distribution]] of ''X'' given the value of ''Y'' to the [[prior probability|prior distribution]] on ''X'':@@@@1@32@@danf@17-8-2009 10410970@unknown@formal@none@1@S@: I(X;Y) = \\mathbb E_{p(y)} [D_{\\mathrm{KL}}( p(X|Y=y) \\| p(X) )].@@@@1@10@@danf@17-8-2009 10410980@unknown@formal@none@1@S@In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''.@@@@1@28@@danf@17-8-2009 10410990@unknown@formal@none@1@S@This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:@@@@1@19@@danf@17-8-2009 10411000@unknown@formal@none@1@S@: I(X; Y) = D_{\\mathrm{KL}}(p(X,Y) \\| p(X)p(Y)).@@@@1@7@@danf@17-8-2009 10411010@unknown@formal@none@1@S@Mutual information is closely related to the [[likelihood-ratio test|log-likelihood ratio test]] in the context of contingency tables and the [[multinomial distribution]] and to [[Pearson's chi-square test|Pearson's χ2 test]]: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.@@@@1@49@@danf@17-8-2009 10411020@unknown@formal@none@1@S@===Kullback–Leibler divergence (information gain)===@@@@1@4@@danf@17-8-2009 10411030@unknown@formal@none@1@S@The '''[[Kullback–Leibler divergence]]''' (or '''information divergence''', '''information gain''', or '''relative entropy''') is a way of comparing two distributions: a "true" [[probability distribution]] ''p(X)'', and an arbitrary probability distribution ''q(X)''.@@@@1@29@@danf@17-8-2009 10411040@unknown@formal@none@1@S@If we compress data in a manner that assumes ''q(X)'' is the distribution underlying some data, when, in reality, ''p(X)'' is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.@@@@1@39@@danf@17-8-2009 10411050@unknown@formal@none@1@S@It is thus defined@@@@1@4@@danf@17-8-2009 10411060@unknown@formal@none@1@S@:D_{\\mathrm{KL}}(p(X) \\| q(X)) = \\sum_{x \\in X} -p(x) \\log {q(x)} \\, - \\, \\left( -p(x) \\log {p(x)}\\right) = \\sum_{x \\in X} p(x) \\log \\frac{p(x)}{q(x)}.@@@@1@24@@danf@17-8-2009 10411070@unknown@formal@none@1@S@Although it is sometimes used as a 'distance metric', it is not a true [[Metric (mathematics)|metric]] since it is not symmetric and does not satisfy the [[triangle inequality]] (making it a semi-quasimetric).@@@@1@32@@danf@17-8-2009 10411080@unknown@formal@none@1@S@===Other quantities===@@@@1@2@@danf@17-8-2009 10411090@unknown@formal@none@1@S@Other important information theoretic quantities include [[Rényi entropy]] (a generalization of entropy) and [[differential entropy]] (a generalization of quantities of information to continuous distributions.)@@@@1@24@@danf@17-8-2009 10411100@unknown@formal@none@1@S@==Coding theory==@@@@1@2@@danf@17-8-2009 10411110@unknown@formal@none@1@S@[[Coding theory]] is one of the most important and direct applications of information theory.@@@@1@14@@danf@17-8-2009 10411120@unknown@formal@none@1@S@It can be subdivided into [[data compression|source coding]] theory and [[error correction|channel coding]] theory.@@@@1@14@@danf@17-8-2009 10411130@unknown@formal@none@1@S@Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.@@@@1@26@@danf@17-8-2009 10411140@unknown@formal@none@1@S@* Data compression (source coding): There are two formulations for the compression problem:@@@@1@13@@danf@17-8-2009 10411150@unknown@formal@none@1@S@#[[lossless data compression]]: the data must be reconstructed exactly;@@@@1@9@@danf@17-8-2009 10411160@unknown@formal@none@1@S@#[[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function.@@@@1@20@@danf@17-8-2009 10411170@unknown@formal@none@1@S@This subset of Information theory is called [[rate–distortion theory]].@@@@1@9@@danf@17-8-2009 10411180@unknown@formal@none@1@S@* Error-correcting codes (channel coding): While data compression removes as much [[redundancy (information theory)|redundancy]] as possible, an error correcting code adds just the right kind of redundancy (i.e. [[error correction]]) needed to transmit the data efficiently and faithfully across a noisy channel.@@@@1@42@@danf@17-8-2009 10411190@unknown@formal@none@1@S@This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts.@@@@1@35@@danf@17-8-2009 10411200@unknown@formal@none@1@S@However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user.@@@@1@19@@danf@17-8-2009 10411210@unknown@formal@none@1@S@In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the [[broadcast channel]]) or intermediary "helpers" (the [[relay channel]]), or more general [[computer network|networks]], compression followed by transmission may no longer be optimal.@@@@1@37@@danf@17-8-2009 10411220@unknown@formal@none@1@S@[[Network information theory]] refers to these multi-agent communication models.@@@@1@9@@danf@17-8-2009 10411230@unknown@formal@none@1@S@===Source theory===@@@@1@2@@danf@17-8-2009 10411240@unknown@formal@none@1@S@Any process that generates successive messages can be considered a '''[[Communication source|source]]''' of information.@@@@1@14@@danf@17-8-2009 10411250@unknown@formal@none@1@S@A memoryless source is one in which each message is an [[Independent identically-distributed random variables|independent identically-distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose more general constraints.@@@@1@31@@danf@17-8-2009 10411260@unknown@formal@none@1@S@All such sources are [[stochastic process|stochastic]].@@@@1@6@@danf@17-8-2009 10411270@unknown@formal@none@1@S@These terms are well studied in their own right outside information theory.@@@@1@12@@danf@17-8-2009 10411280@unknown@formal@none@1@S@====Rate====@@@@1@1@@danf@17-8-2009 10411290@unknown@formal@none@1@S@Information [[Entropy rate|'''rate''']] is the average entropy per symbol.@@@@1@9@@danf@17-8-2009 10411300@unknown@formal@none@1@S@For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is@@@@1@22@@danf@17-8-2009 10411310@unknown@formal@none@1@S@:r = \\lim_{n \\to \\infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \\ldots);@@@@1@7@@danf@17-8-2009 10411320@unknown@formal@none@1@S@that is, the conditional entropy of a symbol given all the previous symbols generated.@@@@1@14@@danf@17-8-2009 10411330@unknown@formal@none@1@S@For the more general case of a process that is not necessarily stationary, the ''average rate'' is@@@@1@17@@danf@17-8-2009 10411340@unknown@formal@none@1@S@:r = \\lim_{n \\to \\infty} \\frac{1}{n} H(X_1, X_2, \\dots X_n);@@@@1@10@@danf@17-8-2009 10411350@unknown@formal@none@1@S@that is, the limit of the joint entropy per symbol.@@@@1@10@@danf@17-8-2009 10411360@unknown@formal@none@1@S@For stationary sources, these two expressions give the same result.@@@@1@10@@danf@17-8-2009 10411370@unknown@formal@none@1@S@It is common in information theory to speak of the "rate" or "entropy" of a language.@@@@1@16@@danf@17-8-2009 10411380@unknown@formal@none@1@S@This is appropriate, for example, when the source of information is English prose.@@@@1@13@@danf@17-8-2009 10411390@unknown@formal@none@1@S@The rate of a source of information is related to its [[redundancy (information theory)|redundancy]] and how well it can be [[data compression|compressed]], the subject of '''source coding'''.@@@@1@27@@danf@17-8-2009 10411400@unknown@formal@none@1@S@===Channel capacity===@@@@1@2@@danf@17-8-2009 10411410@unknown@formal@none@1@S@Communications over a channel—such as an [[ethernet]] wire—is the primary motivation of information theory.@@@@1@14@@danf@17-8-2009 10411420@unknown@formal@none@1@S@As anyone who's ever used a telephone (mobile or landline) knows, however, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.@@@@1@36@@danf@17-8-2009 10411430@unknown@formal@none@1@S@How much information can one hope to communicate over a noisy (or otherwise imperfect) channel?@@@@1@15@@danf@17-8-2009 10411440@unknown@formal@none@1@S@Consider the communications process over a discrete channel.@@@@1@8@@danf@17-8-2009 10411450@unknown@formal@none@1@S@A simple model of the process is shown below:@@@@1@9@@danf@17-8-2009 10411460@unknown@formal@none@1@S@Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel.@@@@1@22@@danf@17-8-2009 10411470@unknown@formal@none@1@S@Let p(y|x) be the [[conditional probability]] distribution function of ''Y'' given ''X''.@@@@1@12@@danf@17-8-2009 10411480@unknown@formal@none@1@S@We will consider p(y|x) to be an inherent fixed property of our communications channel (representing the nature of the '''[[Signal noise|noise]]''' of our channel).@@@@1@24@@danf@17-8-2009 10411490@unknown@formal@none@1@S@Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of f(x), the marginal distribution of messages we choose to send over the channel.@@@@1@32@@danf@17-8-2009 10411500@unknown@formal@none@1@S@Under these constraints, we would like to maximize the rate of information, or the '''[[Signal (electrical engineering)|signal]]''', we can communicate over the channel.@@@@1@23@@danf@17-8-2009 10411510@unknown@formal@none@1@S@The appropriate measure for this is the [[mutual information]], and this maximum mutual information is called the '''[[channel capacity]]''' and is given by:@@@@1@23@@danf@17-8-2009 10411520@unknown@formal@none@1@S@: C = \\max_{f} I(X;Y).\\! @@@@1@6@@danf@17-8-2009 10411530@unknown@formal@none@1@S@This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol).@@@@1@20@@danf@17-8-2009 10411540@unknown@formal@none@1@S@For any information rate ''R < C'' and coding error ε > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.@@@@1@56@@danf@17-8-2009 10411550@unknown@formal@none@1@S@In addition, for any rate ''R > C'', it is impossible to transmit with arbitrarily small block error.@@@@1@18@@danf@17-8-2009 10411560@unknown@formal@none@1@S@'''[[Channel code|Channel coding]]''' is concerned with finding such nearly optimal [[error detection and correction|codes]] that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.@@@@1@37@@danf@17-8-2009 10411570@unknown@formal@none@1@S@====Channel capacity of particular model channels====@@@@1@6@@danf@17-8-2009 10411580@unknown@formal@none@1@S@* A continuous-time analog communications channel subject to Gaussian noise — see [[Shannon–Hartley theorem]].@@@@1@14@@danf@17-8-2009 10411590@unknown@formal@none@1@S@* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability '' p''.@@@@1@26@@danf@17-8-2009 10411600@unknown@formal@none@1@S@The BSC has a capacity of 1 - H_\\mbox{b}(p) bits per channel use, where H_\\mbox{b} is the [[binary entropy function]]:@@@@1@20@@danf@17-8-2009 10411610@unknown@formal@none@1@S@::@@@@1@1@@danf@17-8-2009 10411620@unknown@formal@none@1@S@* A binary erasure channel (BEC) with erasure probability '' p '' is a binary input, ternary output channel.@@@@1@19@@danf@17-8-2009 10411630@unknown@formal@none@1@S@The possible channel outputs are ''0'', ''1'', and a third symbol 'e' called an erasure.@@@@1@15@@danf@17-8-2009 10411640@unknown@formal@none@1@S@The erasure represents complete loss of information about an input bit.@@@@1@11@@danf@17-8-2009 10411650@unknown@formal@none@1@S@The capacity of the BEC is ''1 - p'' bits per channel use.@@@@1@13@@danf@17-8-2009 10411660@unknown@formal@none@1@S@::@@@@1@1@@danf@17-8-2009 10411670@unknown@formal@none@1@S@==Applications to other fields==@@@@1@4@@danf@17-8-2009 10411680@unknown@formal@none@1@S@===Intelligence uses and secrecy applications===@@@@1@5@@danf@17-8-2009 10411690@unknown@formal@none@1@S@Information theoretic concepts apply to [[cryptography]] and [[cryptanalysis]].@@@@1@8@@danf@17-8-2009 10411700@unknown@formal@none@1@S@[[Turing]]'s information unit, the [[Ban (information)|ban]], was used in the [[Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of WWII in Europe]].@@@@1@29@@danf@17-8-2009 10411710@unknown@formal@none@1@S@Shannon himself defined an important concept now called the [[unicity distance]].@@@@1@11@@danf@17-8-2009 10411720@unknown@formal@none@1@S@Based on the [[redundancy (information theory)|redundancy]] of the [[plaintext]], it attempts to give a minimum amount of [[ciphertext]] necessary to ensure unique decipherability.@@@@1@23@@danf@17-8-2009 10411730@unknown@formal@none@1@S@Information theory leads us to believe it is much more difficult to keep secrets than it might first appear.@@@@1@19@@danf@17-8-2009 10411740@unknown@formal@none@1@S@A [[brute force attack]] can break systems based on [[public-key cryptography|asymmetric key algorithms]] or on most commonly used methods of [[symmetric-key algorithm|symmetric key algorithms]] (sometimes called secret key algorithms), such as [[block cipher]]s.@@@@1@33@@danf@17-8-2009 10411750@unknown@formal@none@1@S@The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.@@@@1@24@@danf@17-8-2009 10411760@unknown@formal@none@1@S@[[Information theoretic security]] refers to methods such as the [[one-time pad]] that are not vulnerable to such brute force attacks.@@@@1@20@@danf@17-8-2009 10411770@unknown@formal@none@1@S@In such cases, the positive conditional [[mutual information]] between the [[plaintext]] and [[ciphertext]] (conditioned on the [[key (cryptography)| key]]) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications.@@@@1@40@@danf@17-8-2009 10411780@unknown@formal@none@1@S@In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key.@@@@1@29@@danf@17-8-2009 10411790@unknown@formal@none@1@S@However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the [[Venona project]] was able to crack the one-time pads of the [[Soviet Union]] due to their improper reuse of key material.@@@@1@40@@danf@17-8-2009 10411800@unknown@formal@none@1@S@===Pseudorandom number generation===@@@@1@3@@danf@17-8-2009 10411810@unknown@formal@none@1@S@[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs.@@@@1@13@@danf@17-8-2009 10411820@unknown@formal@none@1@S@They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software.@@@@1@22@@danf@17-8-2009 10411830@unknown@formal@none@1@S@A class of improved random number generators is termed [[Cryptographically secure pseudorandom number generator]]s, but even they require external to the software [[random seed]]s to work as intended.@@@@1@28@@danf@17-8-2009 10411840@unknown@formal@none@1@S@These can be obtained via [[extractor]]s, if done carefully.@@@@1@9@@danf@17-8-2009 10411850@unknown@formal@none@1@S@The measure of sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems.@@@@1@29@@danf@17-8-2009 10411860@unknown@formal@none@1@S@Although related, the distinctions among these measures mean that a [[random variable]] with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.@@@@1@30@@danf@17-8-2009 10411870@unknown@formal@none@1@S@===Miscellaneous applications===@@@@1@2@@danf@17-8-2009 10411880@unknown@formal@none@1@S@Information theory also has applications in [[Gambling and information theory|gambling and investing]], [[black hole information paradox|black holes]], [[bioinformatics]], and [[music]].@@@@1@20@@danf@17-8-2009 10420010@unknown@formal@none@1@S@
Italian language
@@@@1@2@@danf@17-8-2009 10420020@unknown@formal@none@1@S@'''Italian''' (, or ''lingua italiana'') is a [[Romance languages|Romance language]] spoken as a [[first language]] by about 63 million people, primarily in [[Italy]].@@@@1@23@@danf@17-8-2009 10420030@unknown@formal@none@1@S@In [[Switzerland]], Italian is one of four [[Linguistic geography of Switzerland|official language]]s.@@@@1@12@@danf@17-8-2009 10420040@unknown@formal@none@1@S@It is also the official language of [[San Marino]].@@@@1@9@@danf@17-8-2009 10420050@unknown@formal@none@1@S@It is the primary language of the [[Vatican City]].@@@@1@9@@danf@17-8-2009 10420060@unknown@formal@none@1@S@Standard Italian, adopted by the state after the [[unification of Italy]], is based on [[Tuscan dialect|Tuscan]] and is somewhat intermediate between [[Italo-Western|Italo-Dalmatian languages]] of the [[Mezzogiorno|South]] and [[Northern Italian dialects]] of the [[Northern Italy|North]].@@@@1@34@@danf@17-8-2009 10420070@unknown@formal@none@1@S@Unlike most other Romance languages, Italian has retained the contrast between short and [[consonant length|long consonants]] which existed in Latin.@@@@1@20@@danf@17-8-2009 10420080@unknown@formal@none@1@S@As in most [[Romance languages]], [[stress (linguistics)|stress]] is distinctive.@@@@1@9@@danf@17-8-2009 10420090@unknown@formal@none@1@S@Of the Romance languages, Italian is considered to be one of the closest resembling [[Latin]] in terms of [[vocabulary]].@@@@1@19@@danf@17-8-2009 10420100@unknown@formal@none@1@S@According to Ethnologue, lexical similarity is 89% with [[French language|French]], 87% with [[Catalan language|Catalan]], 85% with [[Sardinian language|Sardinian]], 82% with [[Spanish language|Spanish]], 78% with Rheto-Romance, and 77% with Romanian.@@@@1@29@@danf@17-8-2009 10420110@unknown@formal@none@1@S@It is affectionately called ''il parlar gentile'' (the gentle language) by its speakers.@@@@1@13@@danf@17-8-2009 10420120@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10420130@unknown@formal@none@1@S@Italian is written using the [[Latin alphabet]].@@@@1@7@@danf@17-8-2009 10420140@unknown@formal@none@1@S@The letters ''J'', ''K'', ''W'', ''X'' and ''Y'' are not considered part of the standard [[Italian alphabet]], but appear in loanwords (such as ''jeans'', ''whisky'', ''taxi'').@@@@1@26@@danf@17-8-2009 10420150@unknown@formal@none@1@S@''X'' has become a commonly used letter in genuine Italian words with the prefix ''extra-''.@@@@1@15@@danf@17-8-2009 10420160@unknown@formal@none@1@S@''J'' in Italian is an old-fashioned orthographic variant of ''I'', appearing in the first name "Jacopo" as well as in some Italian place names, e.g., the towns of [[Bajardo]], [[Bojano]], [[Joppolo]], [[Jesolo]], [[Jesi]], among numerous others, and in the alternate spelling ''Mar Jonio'' (also spelled ''Mar Ionio'') for the [[Ionian Sea]].@@@@1@51@@danf@17-8-2009 10420170@unknown@formal@none@1@S@''J'' may also appear in many words from different dialects, but its use is discouraged in contemporary Italian, and it is not part of the standard 21-letter contemporary Italian alphabet.@@@@1@30@@danf@17-8-2009 10420180@unknown@formal@none@1@S@Each of these foreign letters had an Italian equivalent spelling: ''gi'' for ''j'', ''c'' or ''ch'' for ''k'', ''u'' or ''v'' for ''w'' (depending on what sound it makes), ''s'', ''ss'', or ''cs'' for ''x'', and ''i'' for ''y''.@@@@1@39@@danf@17-8-2009 10420190@unknown@formal@none@1@S@* Italian uses the [[acute accent]] over the letter ''E'' (as in ''perché'', why/because) to indicate a front mid-close vowel, and the [[grave accent]] (as in ''tè'', tea) to indicate a front mid-open vowel.@@@@1@34@@danf@17-8-2009 10420200@unknown@formal@none@1@S@The [[grave accent]] is also used on letters ''A'', ''I'', ''O'', and ''U'' to mark [[stress (linguistics)|stress]] when it falls on the final vowel of a word (for instance ''gioventù'', youth).@@@@1@31@@danf@17-8-2009 10420210@unknown@formal@none@1@S@Typically, the penultimate syllable is stressed.@@@@1@6@@danf@17-8-2009 10420220@unknown@formal@none@1@S@If syllables other than the last one are stressed, the accent is not mandatory, unlike in [[Spanish language|Spanish]], and, in virtually all cases, it is omitted.@@@@1@26@@danf@17-8-2009 10420230@unknown@formal@none@1@S@In some cases, when the word is ambiguous (as ''principi''), the accent mark is sometimes used in order to disambiguate its meaning (in this case, ''prìncipi'', princes, or ''princìpi'', principles).@@@@1@30@@danf@17-8-2009 10420240@unknown@formal@none@1@S@This is, however, not compulsory.@@@@1@5@@danf@17-8-2009 10420250@unknown@formal@none@1@S@Rare words with three or more syllables can confuse Italians themselves, and the pronunciation of [[Istanbul]] is a common example of a word in which placement of stress is not clearly established.@@@@1@32@@danf@17-8-2009 10420260@unknown@formal@none@1@S@Turkish, like French, tends to put the accent on ultimate syllable, but Italian doesn't.@@@@1@14@@danf@17-8-2009 10420270@unknown@formal@none@1@S@So we can hear "Istànbul" or "Ìstanbul".@@@@1@7@@danf@17-8-2009 10420280@unknown@formal@none@1@S@Another instance is the American State of [[Florida]]: the correct way to pronounce it in Italian is like in Spanish, "Florìda", but since there is an Italian word meaning the same ("flourishing"), "flòrida", and because of the influence of English, most Italians pronounce it that way.@@@@1@46@@danf@17-8-2009 10420290@unknown@formal@none@1@S@Dictionaries give the latter as an alternative pronunciation.@@@@1@8@@danf@17-8-2009 10420300@unknown@formal@none@1@S@* The letter ''H'' at the beginning of a word is used to distinguish ''ho'', ''hai'', ''ha'', ''hanno'' (present indicative of ''avere'', 'to have') from ''o'' ('or'), ''ai'' ('to the'), ''a'' ('to'), ''anno'' ('year').@@@@1@34@@danf@17-8-2009 10420310@unknown@formal@none@1@S@In the spoken language this letter is always silent for the cases given above.@@@@1@14@@danf@17-8-2009 10420320@unknown@formal@none@1@S@''H'' is also used in combinations with other letters (see below), but no [[phoneme]] {{IPA|[h]}} exists in Italian.@@@@1@18@@danf@17-8-2009 10420330@unknown@formal@none@1@S@In foreign words entered in common use, like "hotel" or "hovercraft", the H is commonly silent, so they are pronounced as {{IPA|/oˈtɛl/}} and {{IPA|/ˈɔverkraft/}}@@@@1@24@@danf@17-8-2009 10420340@unknown@formal@none@1@S@* The letter ''Z'' represents {{IPA|/ʣ/}}, for example: ''Zanzara'' {{IPA|/dzan'dzaɾa/}} (mosquito), or {{IPA|/ʦ/}}, for example: ''Nazione'' {{IPA|/naˈttsjone/}} (nation), depending on context, though there are few [[minimal pair]]s.@@@@1@27@@danf@17-8-2009 10420350@unknown@formal@none@1@S@The same goes for ''S'', which can represent {{IPA|/s/}} or {{IPA|/z/}}.@@@@1@11@@danf@17-8-2009 10420360@unknown@formal@none@1@S@However, these two phonemes are in [[complementary distribution]] everywhere except between two vowels in the same word, and even in such environment there are extremely few minimal pairs, so that this distinction is being lost in many varieties.@@@@1@38@@danf@17-8-2009 10420370@unknown@formal@none@1@S@* The letters ''C'' and ''G'' represent [[affricate]]s: [[Voiceless postalveolar affricate|{{IPA|/ʧ/}}]] as in "chair" and [[Voiced postalveolar affricate|{{IPA|/ʤ/}}]] as in "gem", respectively, before the [[front vowel]]s ''I'' and ''E''.@@@@1@29@@danf@17-8-2009 10420380@unknown@formal@none@1@S@They are pronounced as [[plosive]]s {{IPA|/k/}}, {{IPA|/g/}} (as in "call" and "gall") otherwise.@@@@1@13@@danf@17-8-2009 10420390@unknown@formal@none@1@S@Front/back vowel rules for ''C'' and ''G'' are similar in [[French language|French]], [[Romanian language|Romanian]], [[Spanish language|Spanish]], and to some extent [[English language|English]] (including [[Old English]]).@@@@1@25@@danf@17-8-2009 10420400@unknown@formal@none@1@S@[[swedish language|Swedish]] and [[Norwegian language|Norwegian]] have similar rules for ''K'' and ''G''.@@@@1@12@@danf@17-8-2009 10420410@unknown@formal@none@1@S@(See also [[palatalization]].)@@@@1@3@@danf@17-8-2009 10420420@unknown@formal@none@1@S@* However, an ''H'' can be added between ''C'' or ''G'' and ''E'' or ''I'' to represent a plosive, and an ''I'' can be added between ''C'' or ''G'' and ''A'', ''O'' or ''U'' to signal that the consonant is an affricate.@@@@1@42@@danf@17-8-2009 10420430@unknown@formal@none@1@S@For example:@@@@1@2@@danf@17-8-2009 10420440@unknown@formal@none@1@S@:Note that the ''H'' is [[silent letter|silent]] in the digraphs ''[[ch (digraph)|CH]]'' and ''[[gh (digraph)|GH]]'', as also the ''I'' in ''cia'', ''cio'', ''ciu'' and even ''cie'' is not pronounced as a separate vowel, unless it carries the primary stress.@@@@1@39@@danf@17-8-2009 10420450@unknown@formal@none@1@S@For example, it is silent in ''[[ciao]]'' {{IPA|/ˈʧa.o/}} and cielo {{IPA|/ˈʧɛ.lo/}}, but it is pronounced in ''farmacia'' {{IPA|/ˌfaɾ.ma.ˈʧi.a/}} and ''farmacie'' {{IPA|/ˌfaɾ.ma.ˈʧi.e/}}.@@@@1@21@@danf@17-8-2009 10420460@unknown@formal@none@1@S@* There are three other special [[digraph (orthography)|digraphs]] in Italian: ''[[gn (digraph)|GN]]'', ''GL'' and ''SC''.@@@@1@15@@danf@17-8-2009 10420470@unknown@formal@none@1@S@''GN'' represents [[Palatal nasal|{{IPA|/ɲ/}}]].@@@@1@4@@danf@17-8-2009 10420480@unknown@formal@none@1@S@''GL'' represents [[Palatal lateral approximant|{{IPA|/ʎ/}}]] only before ''i'', and never at the beginning of a word, except in the [[personal pronoun]] and [[definite article]] ''gli''.@@@@1@25@@danf@17-8-2009 10420490@unknown@formal@none@1@S@(Compare with [[Spanish language|Spanish]] ''ñ'' and ''ll'', [[Portuguese language|Portuguese]] ''nh'' and ''lh''.)@@@@1@12@@danf@17-8-2009 10420500@unknown@formal@none@1@S@''SC'' represents fricative [[Voiceless postalveolar fricative|{{IPA|/ʃ/}}]] before ''i'' or ''e''.@@@@1@10@@danf@17-8-2009 10420510@unknown@formal@none@1@S@Except in the speech of some Northern Italians, all of these are normally [[geminate]] between vowels.@@@@1@16@@danf@17-8-2009 10420520@unknown@formal@none@1@S@* In general, all letters or digraphs represent phonemes rather clearly, and, in standard varieties of Italian, there is little allophonic variation.@@@@1@22@@danf@17-8-2009 10420530@unknown@formal@none@1@S@The most notable exceptions are assimilation of /n/ in point of articulation before consonants, assimilatory voicing of /s/ to following voiced consonants, and vowel length (vowels are long in stressed open syllables, and short elsewhere) — compare with the enormous number of [[allophone]]s of the English phoneme /t/.@@@@1@48@@danf@17-8-2009 10420540@unknown@formal@none@1@S@Spelling is clearly phonemic and difficult to mistake given a clear pronunciation.@@@@1@12@@danf@17-8-2009 10420550@unknown@formal@none@1@S@Exceptions are generally only found in foreign borrowings.@@@@1@8@@danf@17-8-2009 10420560@unknown@formal@none@1@S@There are fewer cases of [[dyslexia]] than among speakers of languages such as English , and the concept of a spelling bee is strange to Italians.@@@@1@26@@danf@17-8-2009 10420570@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10420580@unknown@formal@none@1@S@The history of the Italian language is long, but the modern standard of the language was largely shaped by relatively recent events.@@@@1@22@@danf@17-8-2009 10420590@unknown@formal@none@1@S@The earliest surviving texts which can definitely be called Italian (or more accurately, vernacular, as opposed to its predecessor [[Vulgar Latin]]) are legal formulae from the region of [[province of Benevento|Benevento]] dating from 960-963.@@@@1@34@@danf@17-8-2009 10420600@unknown@formal@none@1@S@What would come to be thought of as Italian was first formalized in the first years of the 14th century through the works of [[Dante Alighieri]], who mixed southern Italian languages, especially [[Sicilian language|Sicilian]], with his native Tuscan in his epic poems known collectively as the ''[[Divine Comedy|Commedia]],'' to which [[Giovanni Boccaccio]] later affixed the title ''Divina''.@@@@1@57@@danf@17-8-2009 10420610@unknown@formal@none@1@S@Dante's much-loved works were read throughout Italy and his written dialect became the "canonical standard" that all educated Italians could understand.@@@@1@21@@danf@17-8-2009 10420620@unknown@formal@none@1@S@Dante is still credited with standardizing the Italian language and, thus, the dialect of [[Tuscany]] became the basis for what would become the official language of Italy.@@@@1@27@@danf@17-8-2009 10420630@unknown@formal@none@1@S@Italy has always had a distinctive dialect for each city since the cities were until recently thought of as [[city-state]]s.@@@@1@20@@danf@17-8-2009 10420640@unknown@formal@none@1@S@The latter now has considerable [[variety (linguistics)|variety]], however.@@@@1@8@@danf@17-8-2009 10420650@unknown@formal@none@1@S@As Tuscan-derived Italian came to be used throughout the nation, features of local speech were naturally adopted, producing various versions of Regional Italian.@@@@1@23@@danf@17-8-2009 10420660@unknown@formal@none@1@S@The most characteristic differences, for instance, between [[Romanesco|Roman Italian]] and [[Milanese|Milanese Italian]] are the [[consonant length|gemination]] of initial consonants and the pronunciation of stressed "e", and of "s" in some cases (e.g. ''va bene'' "all right": is pronounced {{IPA|[va ˈbːɛne]}} by a Roman, {{IPA|[va ˈbene]}} by a Milanese; ''a casa'' "at home": Roman {{IPA|[a ˈkːasa]}}, Milanese {{IPA|[a ˈkaza]}}).@@@@1@58@@danf@17-8-2009 10420670@unknown@formal@none@1@S@In contrast to the [[Northern Italian language|dialects of northern Italy]], [[southern Italian]] dialects were largely untouched by the Franco-[[Occitan language|Occitan]] influences introduced to Italy, mainly by [[bard]]s from [[France]], during the [[Middle Ages]].@@@@1@33@@danf@17-8-2009 10420680@unknown@formal@none@1@S@Even in the case of Northern Italian dialects, however, scholars are careful not to overstate the effects of outsiders on the natural indigenous developments of the languages.@@@@1@27@@danf@17-8-2009 10420690@unknown@formal@none@1@S@(See [[La Spezia-Rimini Line]].)@@@@1@4@@danf@17-8-2009 10420700@unknown@formal@none@1@S@The economic might and relative advanced development of [[Tuscany]] at the time ([[Late Middle Ages]]), gave its dialect weight, though Venetian remained widespread in medieval Italian commercial life.@@@@1@28@@danf@17-8-2009 10420710@unknown@formal@none@1@S@Also, the increasing cultural relevance of [[Florence, Italy|Florence]] during the periods of '[[Humanism|Umanesimo (Humanism)]]' and the [[Renaissance|Rinascimento (Renaissance)]] made its ''volgare'' (dialect), or rather a refined version of it, a standard in the arts.@@@@1@34@@danf@17-8-2009 10420720@unknown@formal@none@1@S@The re-discovery of Dante's ''[[De vulgari eloquentia]]'' and a renewed interest in linguistics in the 16th century sparked a debate which raged throughout Italy concerning which criteria should be chosen to establish a modern Italian standard to be used as much as a literary as a spoken language.@@@@1@48@@danf@17-8-2009 10420730@unknown@formal@none@1@S@Scholars were divided into three factions: the [[purism|purists]], headed by [[Pietro Bembo]] who in his ''[[Gli Asolani]]'' claimed that the language might only be based on the great literary classics (notably, [[Petrarch]], and Boccaccio but not Dante as Bembo believed that the Divine Comedy was not dignified enough as it used elements from other dialects), [[Niccolò Machiavelli]] and other [[Florence|Florentine]]s who preferred the version spoken by ordinary people in their own times, and the [[Courtesan]]s like [[Baldassarre Castiglione]] and [[Gian Giorgio Trissino]] who insisted that each local vernacular must contribute to the new standard.@@@@1@94@@danf@17-8-2009 10420740@unknown@formal@none@1@S@Eventually Bembo's ideas prevailed, the result being the publication of the first Italian dictionary in 1612 and the foundation of the [[Accademia della Crusca]] in Florence (1582-3), the official legislative body of the Italian language.@@@@1@35@@danf@17-8-2009 10420750@unknown@formal@none@1@S@Italian literature's first modern novel, [[The Betrothed|''I Promessi Sposi'']] (The Betrothed), by [[Alessandro Manzoni]] further defined the standard by "rinsing" his Milanese 'in the waters of the [[Arno River|Arno]]" ([[Florence]]'s river), as he states in the Preface to his 1840 edition.@@@@1@41@@danf@17-8-2009 10420760@unknown@formal@none@1@S@After unification a huge number of civil servants and soldiers recruited from all over the country introduced many more words and idioms from their home dialects ("[[ciao]]" is [[Venetian language|Venetian]], "[[panettone]]" is [[Milanese]] etc.).@@@@1@34@@danf@17-8-2009 10420770@unknown@formal@none@1@S@==Classification==@@@@1@1@@danf@17-8-2009 10420780@unknown@formal@none@1@S@Italian is most closely related to the other two Italo-Dalmatian languages, [[Sicilian language|Sicilian]] and the extinct [[Dalmatian language|Dalmatian]].@@@@1@18@@danf@17-8-2009 10420790@unknown@formal@none@1@S@The three are part of the [[Italo-Western languages|Italo-Western]] grouping of the [[Romance languages]], which are a subgroup of the [[Italic languages|Italic]] branch of [[Indo-European language family|Indo-European]].@@@@1@26@@danf@17-8-2009 10420800@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10420810@unknown@formal@none@1@S@The total speakers of Italian as maternal language are between 60 and 70 million.@@@@1@14@@danf@17-8-2009 10420820@unknown@formal@none@1@S@The speakers who use Italian as second or cultural language are estimated around 110-120 million .@@@@1@16@@danf@17-8-2009 10420830@unknown@formal@none@1@S@Italian is the official language of [[Italy]] and [[San Marino]], and one of the official languages of [[Switzerland]], spoken mainly in [[Canton Ticino|Ticino]] and [[Graubünden|Grigioni]] cantons, a region referred to as [[Italian Switzerland]].@@@@1@33@@danf@17-8-2009 10420840@unknown@formal@none@1@S@It is also the second official language in some areas of [[Istria]], in [[Slovenia]] and [[Croatia]], where an Italian minority exists.@@@@1@21@@danf@17-8-2009 10420850@unknown@formal@none@1@S@It is the primary language of the [[Vatican City]] and is widely used and taught in [[Monaco]] and [[Malta]].@@@@1@19@@danf@17-8-2009 10420860@unknown@formal@none@1@S@It is also widely understood in France with over one million speakers (especially in [[Corsica]] and the [[County of Nice]], areas that historically spoke [[Italian dialects]] before annexation to [[France]]), and in [[Albania]].@@@@1@33@@danf@17-8-2009 10420870@unknown@formal@none@1@S@Italian is also spoken by some in former Italian colonies in [[Africa]] ([[Libya]], [[Somalia]] and [[Eritrea]]).@@@@1@16@@danf@17-8-2009 10420880@unknown@formal@none@1@S@However, its use has sharply dropped off since the colonial period.@@@@1@11@@danf@17-8-2009 10420890@unknown@formal@none@1@S@In [[Eritrea]] [[Italian Language|Italian]] is widely understood .@@@@1@8@@danf@17-8-2009 10420900@unknown@formal@none@1@S@In fact, for fifty years, during the colonial period, Italian was the language of instruction, but [[as of 1997]], there is only one Italian language school remaining, with 470 pupils.@@@@1@30@@danf@17-8-2009 10420910@unknown@formal@none@1@S@In [[Somalia]] Italian used to be a major language but due to the civil war and lack of education only the older generation still uses it.@@@@1@26@@danf@17-8-2009 10420920@unknown@formal@none@1@S@Italian and [[Italian dialects]] are widely used by Italian immigrants and many of their descendants (see ''[[Italians]]'') living throughout [[Western Europe]] (especially [[France]], [[Germany]], [[Belgium]], [[Switzerland]], the [[Britalian|United Kingdom]] and [[Luxembourg]]), the [[Italian Americans|United States]], [[Italian Canadians|Canada]], [[Italian Australians|Australia]], and [[Latin America]] (especially [[Uruguay]], [[Italian Brazilians|Brazil]], [[Argentina]], and [[Venezuela]]).@@@@1@49@@danf@17-8-2009 10420930@unknown@formal@none@1@S@In the United States, Italian speakers are most commonly found in four cities: [[Boston]] (7,000), [[Chicago]] (12,000), [[New York City]] (140,000), and [[Philadelphia]] (15,000).@@@@1@24@@danf@17-8-2009 10420940@unknown@formal@none@1@S@In Canada there are large Italian-speaking communities in [[Montreal]] (120,000) and [[Toronto]] (195,000).@@@@1@13@@danf@17-8-2009 10420950@unknown@formal@none@1@S@Italian is the second most commonly-spoken language in Australia, where 353,605 [[Italian Australian]]s, or 1.9% of the population, reported speaking Italian at home in the 2001 [[Census in Australia|Census]].@@@@1@29@@danf@17-8-2009 10420960@unknown@formal@none@1@S@In 2001 there were 130,000 Italian speakers in [[Melbourne]], and 90,000 in [[Sydney]].@@@@1@13@@danf@17-8-2009 10420970@unknown@formal@none@1@S@===Italian language education===@@@@1@3@@danf@17-8-2009 10420980@unknown@formal@none@1@S@Italian is widely taught in many schools around the world, but rarely as the first non-native language of pupils; in fact, Italian generally is the fourth or fifth most taught second-language in the world.@@@@1@34@@danf@17-8-2009 10420990@unknown@formal@none@1@S@In [[anglophone]] parts of [[Canada]], Italian is, after [[French language|French]], the third most taught language.@@@@1@15@@danf@17-8-2009 10421000@unknown@formal@none@1@S@In [[francophone]] Canada it is third after [[English language|English]].@@@@1@9@@danf@17-8-2009 10421010@unknown@formal@none@1@S@In the [[United States]] and the [[United Kingdom]], Italian ranks fourth (after [[Spanish language|Spanish]]-French-[[German language|German]] and French-German-Spanish respectively).@@@@1@18@@danf@17-8-2009 10421020@unknown@formal@none@1@S@Throughout the world, Italian is the fifth most taught non-native language, after [[English language|English]], French, Spanish, and German.@@@@1@18@@danf@17-8-2009 10421030@unknown@formal@none@1@S@In the [[European Union]], Italian is spoken as a mother tongue by 13% of the population (64 million, mainly in Italy itself) and as a second language by 3% (14 million); among EU member states, it is most likely to be desired (and therefore learned) as a second language in [[Malta]] (61%), [[Croatia]] (14%), [[Slovenia]] (12%), [[Austria]] (11%), [[Romania]] (8%), [[France]] (6%), and [[Greece]] (6%).@@@@1@65@@danf@17-8-2009 10421040@unknown@formal@none@1@S@It is also an important second language in [[Albania]] and [[Switzerland]], which are not EU members or candidates.@@@@1@18@@danf@17-8-2009 10421050@unknown@formal@none@1@S@===Influence and derived languages===@@@@1@4@@danf@17-8-2009 10421060@unknown@formal@none@1@S@From the late 19th to the mid 20th century, thousands of Italians settled in Argentina, Uruguay and southern Brazil, where they formed a very strong physical and cultural presence (see the [[Italian diaspora]]).@@@@1@33@@danf@17-8-2009 10421070@unknown@formal@none@1@S@In some cases, colonies were established where variants of [[Italian dialects]] were used, and some continue to use a derived dialect.@@@@1@21@@danf@17-8-2009 10421080@unknown@formal@none@1@S@An example is [[Rio Grande do Sul]], [[Brazil]], where [[Talian]] is used and in the town of [[Chipilo]] near Puebla, [[Mexico]] each continuing to use a derived form of [[Venetian language|Venetian]] dating back to the 19th century.@@@@1@37@@danf@17-8-2009 10421090@unknown@formal@none@1@S@Another example is [[Cocoliche]], an Italian-Spanish [[pidgin]] once spoken in [[Argentina]] and especially in [[Buenos Aires]], and [[Lunfardo]].@@@@1@18@@danf@17-8-2009 10421100@unknown@formal@none@1@S@[[Rioplatense Spanish]], and particularly the speech of the city of Buenos Aires, has intonation patterns that resemble those of Italian dialects, due to the fact that Argentina had a constant, large influx of Italian settlers since the second half of the nineteenth century; initially primarily from Northern Italy then, since the beginning of the twentieth century, mostly from Southern Italy.@@@@1@60@@danf@17-8-2009 10421110@unknown@formal@none@1@S@===Lingua Franca===@@@@1@2@@danf@17-8-2009 10421120@unknown@formal@none@1@S@Starting in late [[medieval]] times, Italian language variants replaced Latin to become the primary commercial language for much of Europe and Mediterranean Sea (especially the Tuscan and Venetian variants).@@@@1@29@@danf@17-8-2009 10421130@unknown@formal@none@1@S@This became solidified during the [[Renaissance]] with the strength of Italian banking and the rise of [[Renaissance humanism|humanism]] in the arts.@@@@1@21@@danf@17-8-2009 10421140@unknown@formal@none@1@S@During the period of the Renaissance, Italy held artistic sway over the rest of Europe.@@@@1@15@@danf@17-8-2009 10421150@unknown@formal@none@1@S@All educated European gentlemen were expected to make the [[Grand Tour]], visiting Italy to see its great historical monuments and works of art.@@@@1@23@@danf@17-8-2009 10421160@unknown@formal@none@1@S@It thus became expected that educated Europeans would learn at least some Italian; the English poet [[John Milton]], for instance, wrote some of his early poetry in Italian.@@@@1@28@@danf@17-8-2009 10421170@unknown@formal@none@1@S@In England, Italian became the second most common modern language to be learned, after [[French language|French]] (though the classical languages, [[Latin]] and [[Greek language|Greek]], came first).@@@@1@26@@danf@17-8-2009 10421180@unknown@formal@none@1@S@However, by the late eighteenth century, Italian tended to be replaced by [[German language|German]] as the second modern language on the curriculum.@@@@1@22@@danf@17-8-2009 10421190@unknown@formal@none@1@S@Yet Italian [[loanword]]s continue to be used in most other [[European languages]] in matters of art and music.@@@@1@18@@danf@17-8-2009 10421200@unknown@formal@none@1@S@Today, the Italian language continues to be used as a [[lingua franca]] in some environments.@@@@1@15@@danf@17-8-2009 10421210@unknown@formal@none@1@S@Within the [[Catholic church]] Italian is known by a large part of the ecclesiastic hierarchy, and is used in substitution of [[Latin]] in some official documents.@@@@1@26@@danf@17-8-2009 10421220@unknown@formal@none@1@S@The presence of Italian as the primary language in the [[Vatican City]] indicates not only use within the [[Holy See]], but also throughout the world where an episcopal seat is present.@@@@1@31@@danf@17-8-2009 10421230@unknown@formal@none@1@S@It continues to be used in [[music]] and [[opera]].@@@@1@9@@danf@17-8-2009 10421240@unknown@formal@none@1@S@Other examples where Italian is sometimes used as a means communication is in some sports (sometimes in [[Football (association)|football]] and [[motorsports]]) and in the [[design]] and [[fashion]] industries.@@@@1@28@@danf@17-8-2009 10421250@unknown@formal@none@1@S@==Dialects==@@@@1@1@@danf@17-8-2009 10421260@unknown@formal@none@1@S@In Italy, all [[Romance languages]] spoken as the vernacular, other than standard Italian and other unrelated, non-Italian languages, are termed "Italian dialects".@@@@1@22@@danf@17-8-2009 10421270@unknown@formal@none@1@S@Many Italian dialects are, in fact, historical languages in their own right.@@@@1@12@@danf@17-8-2009 10421280@unknown@formal@none@1@S@These include recognized language groups such as [[Friulian language|Friulian]], [[Neapolitan language|Neapolitan]], [[Sardinian language|Sardinian]], [[Sicilian language|Sicilian]], [[Venetian language|Venetian]], and others, and regional variants of these languages such as [[Calabrian languages|Calabrian]].@@@@1@29@@danf@17-8-2009 10421290@unknown@formal@none@1@S@The division between dialect and language has been used by scholars (such as by [[Francesco Bruni]]) to distinguish between the languages that made up the Italian [[koine]], and those which had very little or no part in it, such as [[Albanian language|Albanian]], [[Greek language|Greek]], [[German language|German]], [[Ladin language|Ladin]], and [[Occitan language|Occitan]], which are still spoken by minorities.@@@@1@57@@danf@17-8-2009 10421300@unknown@formal@none@1@S@Dialects are generally not used for general mass communication and are usually limited to native speakers in informal contexts.@@@@1@19@@danf@17-8-2009 10421310@unknown@formal@none@1@S@In the past, speaking in dialect was often deprecated as a sign of poor education.@@@@1@15@@danf@17-8-2009 10421320@unknown@formal@none@1@S@Younger generations, especially those under 35 (though it may vary in different areas), speak almost exclusively standard Italian in all situations, usually with local accents and idioms.@@@@1@27@@danf@17-8-2009 10421330@unknown@formal@none@1@S@Regional differences can be recognized by various factors: the openness of vowels, the length of the consonants, and influence of the local dialect (for example, ''annà'' replaces ''andare'' in the area of Rome for the infinitive "to go").@@@@1@38@@danf@17-8-2009 10421340@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10421350@unknown@formal@none@1@S@{{IPA notice|lang=it}}@@@@1@2@@danf@17-8-2009 10421360@unknown@formal@none@1@S@===Vowels===@@@@1@1@@danf@17-8-2009 10421370@unknown@formal@none@1@S@Italian has seven [[vowel]] phonemes: {{IPA|/a/}}, {{IPA|/e/}}, {{IPA|/ɛ/}}, {{IPA|/i/}}, {{IPA|/o/}}, {{IPA|/ɔ/}}, {{IPA|/u/}}.@@@@1@12@@danf@17-8-2009 10421380@unknown@formal@none@1@S@The pairs {{IPA|/e/}}-{{IPA|/ɛ/}} and {{IPA|/o/}}-{{IPA|/ɔ/}} are seldom distinguished in writing and often confused, even though most varieties of Italian employ both phonemes consistently.@@@@1@23@@danf@17-8-2009 10421390@unknown@formal@none@1@S@Compare, for example: "perché" {{IPA|[perˈkɛ]}} (why, because) and "senti" {{IPA|[ˈsenti]}} (you listen, you are listening, listen!), employed by some northern speakers, with {{IPA|[perˈke]}} and {{IPA|[ˈsɛnti]}}, as pronounced by most central and southern speakers.@@@@1@33@@danf@17-8-2009 10421400@unknown@formal@none@1@S@As a result, the usage is strongly indicative of a person's origin.@@@@1@12@@danf@17-8-2009 10421410@unknown@formal@none@1@S@The standard (Tuscan) usage of these vowels is listed in vocabularies, and employed outside Tuscany mainly by specialists, especially actors and very few (television) journalists.@@@@1@25@@danf@17-8-2009 10421420@unknown@formal@none@1@S@These are truly different [[phonemes]], however: compare {{IPA|/ˈpeska/}} (fishing) and {{IPA|/ˈpɛska/}} (peach), both spelled ''pesca'' .@@@@1@16@@danf@17-8-2009 10421430@unknown@formal@none@1@S@Similarly {{IPA|/ˈbotte/}} ('barrel') and {{IPA|/ˈbɔtte/}} ('beatings'), both spelled ''botte'', discriminate {{IPA|/o/}} and {{IPA|/ɔ/}} .@@@@1@14@@danf@17-8-2009 10421440@unknown@formal@none@1@S@In general, vowel combinations usually pronounce each vowel separately.@@@@1@9@@danf@17-8-2009 10421450@unknown@formal@none@1@S@[[Diphthong]]s exist (e.g. ''uo'', ''iu'', ''ie'', ''ai''), but are limited to an unstressed ''u'' or ''i'' before or after a stressed vowel.@@@@1@22@@danf@17-8-2009 10421460@unknown@formal@none@1@S@The unstressed ''u'' in a diphthong approximates the English semivowel ''w'', the unstressed ''i'' approximates the semivowel ''y''.@@@@1@18@@danf@17-8-2009 10421470@unknown@formal@none@1@S@E.g.: ''buono'' {{IPA|[ˈbwɔno]}}, ''ieri'' {{IPA|[ˈjɛri]}}.@@@@1@5@@danf@17-8-2009 10421480@unknown@formal@none@1@S@[[Triphthong]]s exist in Italian as well, like "contin''uia''mo" ("we continue").@@@@1@10@@danf@17-8-2009 10421490@unknown@formal@none@1@S@Three vowel combinations exist only in the form semiconsonant ({{IPA|/j/}} or {{IPA|/w/}}), followed by a vowel, followed by a desinence vowel (usually {{IPA|/i/}}), as in ''miei'', ''suoi'', or two semiconsonants followed by a vowel, as the group ''-uia-'' exemplified above, or ''-iuo-'' in the word ''aiuola''.@@@@1@46@@danf@17-8-2009 10421500@unknown@formal@none@1@S@===Mobile diphthongs===@@@@1@2@@danf@17-8-2009 10421510@unknown@formal@none@1@S@Many Latin words with a short ''e'' or ''o'' have Italian counterparts with a mobile diphthong (''ie'' and ''uo'' respectively).@@@@1@20@@danf@17-8-2009 10421520@unknown@formal@none@1@S@When the vowel sound is stressed, it is pronounced and written as a diphthong; when not stressed, it is pronounced and written as a single vowel.@@@@1@26@@danf@17-8-2009 10421530@unknown@formal@none@1@S@So Latin ''focus'' gave rise to Italian ''fuoco'' (meaning both "fire" and "optical focus"): when unstressed, as in ''focale'' ("focal") the "o" remains alone.@@@@1@24@@danf@17-8-2009 10421540@unknown@formal@none@1@S@Latin ''pes'' (more precisely its accusative form ''pedem'') is the source of Italian ''piede'' (foot): but unstressed "e" was left unchanged in ''pedone'' (pedestrian) and ''pedale'' (pedal).@@@@1@27@@danf@17-8-2009 10421550@unknown@formal@none@1@S@From Latin ''iocus'' comes Italian ''giuoco'' ("play", "game"), though in this case ''gioco'' is more common: ''giocare'' means "to play (a game)".@@@@1@22@@danf@17-8-2009 10421560@unknown@formal@none@1@S@From Latin ''homo'' comes Italian ''uomo'' (man), but also ''umano'' (human) and ''ominide'' (hominid).@@@@1@14@@danf@17-8-2009 10421570@unknown@formal@none@1@S@From Latin ''ovum'' comes Italian ''uovo'' (egg) and ''ovaie'' (ovaries).@@@@1@10@@danf@17-8-2009 10421580@unknown@formal@none@1@S@(The same phenomenon occurs in [[Spanish language|Spanish]]: ''juego'' (play, game) and ''jugar'' (to play), ''nieve'' (snow) and ''nevar'' (to snow)).@@@@1@20@@danf@17-8-2009 10421590@unknown@formal@none@1@S@===Consonants===@@@@1@1@@danf@17-8-2009 10421600@unknown@formal@none@1@S@Two symbols in a table cell denote the voiceless and voiced consonant, respectively.@@@@1@13@@danf@17-8-2009 10421610@unknown@formal@none@1@S@Nasals undergo assimilation when followed by a consonant, e.g., when preceding a velar ({{IPA|/k/}} or {{IPA|/g/}}) only {{IPA|[ŋ]}} appears, etc.@@@@1@20@@danf@17-8-2009 10421620@unknown@formal@none@1@S@Italian has geminate, or double, consonants, which are distinguished by [[Consonant length|length]].@@@@1@12@@danf@17-8-2009 10421630@unknown@formal@none@1@S@Length is distinctive for all consonants except for {{IPA|/ʃ/}}, {{IPA|/ʦ/}}, {{IPA|/ʣ/}}, {{IPA|/ʎ/}} {{IPA|/ɲ/}}, which are always geminate, and {{IPA|/z/}} which is always single.@@@@1@23@@danf@17-8-2009 10421640@unknown@formal@none@1@S@Geminate plosives and affricates are realised as lengthened closures.@@@@1@9@@danf@17-8-2009 10421650@unknown@formal@none@1@S@Geminate fricatives, nasals, and {{IPA|/l/}} are realized as lengthened [[continuant]]s.@@@@1@10@@danf@17-8-2009 10421660@unknown@formal@none@1@S@The flap consonant {{IPA|/ɾː/}} is typically dialectal, and it is called ''erre moscia''.@@@@1@13@@danf@17-8-2009 10421670@unknown@formal@none@1@S@The correct standard pronunciation is {{IPA|[r]}}.@@@@1@6@@danf@17-8-2009 10421680@unknown@formal@none@1@S@Of special interest to the linguistic study of Italian is the ''[[Tuscan gorgia|Gorgia Toscana]]'', or "Tuscan Throat", the weakening or [[lenition]] of certain [[:wiktionary:intervocalic|intervocalic]] consonants in [[Tuscan dialect]]s.@@@@1@28@@danf@17-8-2009 10421690@unknown@formal@none@1@S@See also [[Syntactic doubling]].@@@@1@4@@danf@17-8-2009 10421700@unknown@formal@none@1@S@===Assimilation===@@@@1@1@@danf@17-8-2009 10421710@unknown@formal@none@1@S@Italian has few diphthongs, so most unfamiliar diphthongs that are heard in foreign words (in particular, those beginning with vowel "a", "e", or "o") will be assimilated as the corresponding [[diaeresis]] (i.e., the vowel sounds will be pronounced separately).@@@@1@39@@danf@17-8-2009 10421720@unknown@formal@none@1@S@Italian [[phonotactics]] do not usually permit polysyllabic nouns and verbs to end with consonants, excepting poetry and song, so foreign words may receive extra terminal vowel sounds.@@@@1@27@@danf@17-8-2009 10421730@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10421740@unknown@formal@none@1@S@===Common variations in the writing systems===@@@@1@6@@danf@17-8-2009 10421750@unknown@formal@none@1@S@Some variations in the usage of the writing system may be present in practical use.@@@@1@15@@danf@17-8-2009 10421760@unknown@formal@none@1@S@These are scorned by educated people, but they are so common in certain contexts that knowledge of them may be useful.@@@@1@21@@danf@17-8-2009 10421770@unknown@formal@none@1@S@* Usage of ''x'' instead of ''per'': this is very common among teenagers and in [[Text messaging|SMS]] abbreviations.@@@@1@18@@danf@17-8-2009 10421780@unknown@formal@none@1@S@The multiplication operator is pronounced "per" in Italian, and so it is sometimes used to replace the word "per", which means "for"; thus, for example, "per te" ("for you") is shortened to "x te" (compare with English "4 U").@@@@1@39@@danf@17-8-2009 10421790@unknown@formal@none@1@S@Words containing ''per'' can also have it replaced with ''x'': for example, ''perché'' (both "why" and "because") is often shortened as ''xché'' or ''xké'' or ''x' ''(see below).@@@@1@28@@danf@17-8-2009 10421800@unknown@formal@none@1@S@This usage might be useful to jot down quick notes or to fit more text into the low character limit of an SMS, but it is considered unacceptable in formal writing.@@@@1@31@@danf@17-8-2009 10421810@unknown@formal@none@1@S@* Usage of foreign letters such as ''k'', ''j'' and ''y'', especially in nicknames and SMS language: ''ke'' instead of ''che'', ''Giusy'' instead of ''Giuseppina'' (or sometimes ''Giuseppe'').@@@@1@28@@danf@17-8-2009 10421820@unknown@formal@none@1@S@This is curiously mirrored in the usage of ''i'' in English names such as ''Staci'' instead of ''Stacey'', or in the usage of ''c'' in [[Northern Europe]] (''Jacob'' instead of ''Jakob'').@@@@1@31@@danf@17-8-2009 10421830@unknown@formal@none@1@S@The use of "k" instead of "ch" or "c" to represent a plosive sound is documented in some historical texts from before the standardization of the Italian language; however, that usage is no longer standard in Italian.@@@@1@37@@danf@17-8-2009 10421840@unknown@formal@none@1@S@Possibly because it is associated with the [[German language]], the letter "k" has sometimes also been used in satire to suggest that a political figure is an authoritarian or even a "pseudo-nazi": [[Francesco Cossiga]] was famously nicknamed ''Kossiga'' by rioting students during his tenure as minister of internal affairs.@@@@1@49@@danf@17-8-2009 10421850@unknown@formal@none@1@S@[Cf. the [[alternative political spelling#"K" replacing "C"|politicized spelling ''Amerika'']] in the USA.]@@@@1@12@@danf@17-8-2009 10421860@unknown@formal@none@1@S@* Usage of the following abbreviations is limited to the electronic communications media and is deprecated in all other cases: '''nn''' instead of ''non'' (not), '''cmq''' instead of ''comunque'' (anyway, however), '''cm''' instead of ''come'' (how, like, as), '''d''' instead of ''di'' (of), '''(io/loro) sn''' instead of ''(io/loro) sono'' (I am/they are), '''(io) dv''' instead of ''(io) devo'' (I must/I have to) or instead of ''dove'' (where), '''(tu) 6''' instead of ''(tu) sei'' (you are).@@@@1@75@@danf@17-8-2009 10421870@unknown@formal@none@1@S@* Inexperienced typists often replace accents with apostrophes, such as in ''perche''' instead of ''perché''.@@@@1@15@@danf@17-8-2009 10421880@unknown@formal@none@1@S@Uppercase ''[[È]]'' is particularly rare, as it is absent from the [[Keyboard layout#Italian|Italian keyboard layout]], and is very often written as ''E''' (even though there are [[:it:Aiuto:Manuale di stile#Scrivere .C3.88|several ways]] of producing the uppercase È on a computer).@@@@1@39@@danf@17-8-2009 10421890@unknown@formal@none@1@S@This never happens in books or other professionally typeset material.@@@@1@10@@danf@17-8-2009 10421900@unknown@formal@none@1@S@==Samples==@@@@1@1@@danf@17-8-2009 10421910@unknown@formal@none@1@S@==Examples==@@@@1@1@@danf@17-8-2009 10421920@unknown@formal@none@1@S@*Cheers: "Salute!"@@@@1@2@@danf@17-8-2009 10421930@unknown@formal@none@1@S@*English: ''inglese'' {{IPA|/iŋˈglese/}}@@@@1@3@@danf@17-8-2009 10421940@unknown@formal@none@1@S@*Good-bye: ''arrivederci'' {{IPA|/arriveˈdertʃi/}}@@@@1@3@@danf@17-8-2009 10421950@unknown@formal@none@1@S@*Hello: ''[[ciao]]'' {{IPA|/ˈtʃao/}}@@@@1@3@@danf@17-8-2009 10421960@unknown@formal@none@1@S@*Good day: ''buon giorno'' {{IPA|/bwɔnˈdʒorno/}}@@@@1@5@@danf@17-8-2009 10421970@unknown@formal@none@1@S@*Good evening: ''buona sera'' {{IPA|/bwɔnaˈsera/}}@@@@1@5@@danf@17-8-2009 10421980@unknown@formal@none@1@S@*Yes: ''sì'' {{IPA|/si/}}@@@@1@3@@danf@17-8-2009 10421990@unknown@formal@none@1@S@*No: ''no'' {{IPA|/nɔ/}}@@@@1@3@@danf@17-8-2009 10422000@unknown@formal@none@1@S@*How are you? : Come stai {{IPA|/ˈkome ˈstai/}} (informal); Come sta {{IPA|/ˈkome 'sta/}} (formal)@@@@1@14@@danf@17-8-2009 10422010@unknown@formal@none@1@S@*Sorry: ''mi dispiace'' {{IPA|/mi disˈpjatʃe/}}@@@@1@5@@danf@17-8-2009 10422020@unknown@formal@none@1@S@*Excuse me: scusa {{IPA|/ˈskuza/}} (informal); scusi {{IPA|/ˈskuzi/}} (formal)@@@@1@8@@danf@17-8-2009 10422030@unknown@formal@none@1@S@*Again: ''di nuovo'', /{{IPA|di ˈnwɔvo}}/; ''ancora'' /{{IPA|aŋˈkora}}/@@@@1@7@@danf@17-8-2009 10422040@unknown@formal@none@1@S@*Always: ''sempre'' /{{IPA|ˈsɛmpre}}/@@@@1@3@@danf@17-8-2009 10422050@unknown@formal@none@1@S@*When: ''quando'' {{IPA|/ˈkwando/}}@@@@1@3@@danf@17-8-2009 10422060@unknown@formal@none@1@S@*Where: ''dove'' {{IPA|/'dove/}}@@@@1@3@@danf@17-8-2009 10422070@unknown@formal@none@1@S@*Why/Because: ''perché'' {{IPA|/perˈke/}}@@@@1@3@@danf@17-8-2009 10422080@unknown@formal@none@1@S@*How: ''come'' {{IPA|/'kome/}}@@@@1@3@@danf@17-8-2009 10422090@unknown@formal@none@1@S@*How much is it?: ''quanto costa?''@@@@1@6@@danf@17-8-2009 10422100@unknown@formal@none@1@S@{{IPA|/ˈkwanto/}}@@@@1@1@@danf@17-8-2009 10422110@unknown@formal@none@1@S@*Thank you!: ''grazie!''@@@@1@3@@danf@17-8-2009 10422120@unknown@formal@none@1@S@{{IPA|/ˈgrattsie/}}@@@@1@1@@danf@17-8-2009 10422130@unknown@formal@none@1@S@*Bon appetit: ''buon appetito'' {{IPA|/ˌbwɔn appeˈtito/}}@@@@1@6@@danf@17-8-2009 10422140@unknown@formal@none@1@S@*You're welcome!: ''prego!''@@@@1@3@@danf@17-8-2009 10422150@unknown@formal@none@1@S@{{IPA|/ˈprɛgo/}}@@@@1@1@@danf@17-8-2009 10422160@unknown@formal@none@1@S@*I love you: ''Ti amo'' {{IPA|/ti ˈamo/}}, ''Ti voglio bene'' {{IPA|/ti ˈvɔʎʎo ˈbɛne/}}.@@@@1@13@@danf@17-8-2009 10422170@unknown@formal@none@1@S@The difference is that you use "Ti amo" when you are in a romantic relationship, "Ti voglio bene" in any other occasion (to parents, to relatives, to friends...)@@@@1@28@@danf@17-8-2009 10422180@unknown@formal@none@1@S@Counting to twenty:@@@@1@3@@danf@17-8-2009 10422190@unknown@formal@none@1@S@*One: ''uno'' {{IPA|/ˈuno/}}@@@@1@3@@danf@17-8-2009 10422200@unknown@formal@none@1@S@*Two: ''due'' {{IPA|/ˈdue/}}@@@@1@3@@danf@17-8-2009 10422210@unknown@formal@none@1@S@*Three: ''tre'' {{IPA|/tre/}}@@@@1@3@@danf@17-8-2009 10422220@unknown@formal@none@1@S@*Four: ''quattro'' {{IPA|/ˈkwattro/}}@@@@1@3@@danf@17-8-2009 10422230@unknown@formal@none@1@S@*Five: ''cinque'' {{IPA|/ˈʧiŋkwe/}}@@@@1@3@@danf@17-8-2009 10422240@unknown@formal@none@1@S@*Six: ''sei'' {{IPA|/ˈsɛi/}}@@@@1@3@@danf@17-8-2009 10422250@unknown@formal@none@1@S@*Seven: ''sette'' {{IPA|/ˈsɛtte/}}@@@@1@3@@danf@17-8-2009 10422260@unknown@formal@none@1@S@*Eight: ''otto'' {{IPA|/ˈɔtto/}}@@@@1@3@@danf@17-8-2009 10422270@unknown@formal@none@1@S@*Nine: ''nove'' {{IPA|/ˈnɔve/}}@@@@1@3@@danf@17-8-2009 10422280@unknown@formal@none@1@S@*Ten: ''dieci'' {{IPA|/ˈdjɛʧi/}}@@@@1@3@@danf@17-8-2009 10422290@unknown@formal@none@1@S@*Eleven: ''undici'' {{IPA|/ˈundiʧi/}}@@@@1@3@@danf@17-8-2009 10422300@unknown@formal@none@1@S@*Twelve: ''dodici'' {{IPA|/ˈdodiʧi/}}@@@@1@3@@danf@17-8-2009 10422310@unknown@formal@none@1@S@*Thirteen: ''tredici'' {{IPA|/ˈtrediʧi/}}@@@@1@3@@danf@17-8-2009 10422320@unknown@formal@none@1@S@*Fourteen: ''quattordici'' {{IPA|/kwat'tordiʧi/}}@@@@1@3@@danf@17-8-2009 10422330@unknown@formal@none@1@S@*Fifteen: ''quindici'' {{IPA|/ˈkwindiʧi/}}@@@@1@3@@danf@17-8-2009 10422340@unknown@formal@none@1@S@*Sixteen: ''sedici'' {{IPA|/ˈsediʧi/}}@@@@1@3@@danf@17-8-2009 10422350@unknown@formal@none@1@S@*Seventeen: ''diciassette'' {{IPA|/diʧas'sɛtte/}}@@@@1@3@@danf@17-8-2009 10422360@unknown@formal@none@1@S@*Eighteen: ''diciotto'' {{IPA|/di'ʧɔtto/}}@@@@1@3@@danf@17-8-2009 10422370@unknown@formal@none@1@S@*Nineteen: ''diciannove'' {{IPA|/diʧan'nɔve/}}@@@@1@3@@danf@17-8-2009 10422380@unknown@formal@none@1@S@*Twenty: ''venti'' {{IPA|/'venti/}}@@@@1@3@@danf@17-8-2009 10422390@unknown@formal@none@1@S@The days of the week:@@@@1@5@@danf@17-8-2009 10422400@unknown@formal@none@1@S@*Monday: ''lunedì'' {{IPA|/lune'di/}}@@@@1@3@@danf@17-8-2009 10422410@unknown@formal@none@1@S@*Tuesday: ''martedì'' {{IPA|/marte'di/}}@@@@1@3@@danf@17-8-2009 10422420@unknown@formal@none@1@S@*Wednesday: ''mercoledì'' {{IPA|/merkole'di/}}@@@@1@3@@danf@17-8-2009 10422430@unknown@formal@none@1@S@*Thursday: ''giovedì'' {{IPA|/dʒove'di/}}@@@@1@3@@danf@17-8-2009 10422440@unknown@formal@none@1@S@*Friday: ''venerdì'' {{IPA|/vener'di/}}@@@@1@3@@danf@17-8-2009 10422450@unknown@formal@none@1@S@*Saturday: ''sabato'' {{IPA|/ˈsabato/}}@@@@1@3@@danf@17-8-2009 10422460@unknown@formal@none@1@S@*Sunday: ''domenica'' {{IPA|/do'menika/}}@@@@1@3@@danf@17-8-2009 10422470@unknown@formal@none@1@S@==Sample texts==@@@@1@2@@danf@17-8-2009 10422480@unknown@formal@none@1@S@There is a recording of [[Dante]]'s [[Divine Comedy]] read by [[Lino Pertile]] available at http://etcweb.princeton.edu/dante/pdp/@@@@1@15@@danf@17-8-2009 10430010@unknown@formal@none@1@S@
Japanese language
@@@@1@2@@danf@17-8-2009 10430020@unknown@formal@none@1@S@{{Nihongo|'''Japanese'''|日本語 / にほんご |3=}} is a language spoken by over 130 million people in [[Japan]] and in Japanese emigrant communities.@@@@1@20@@danf@17-8-2009 10430030@unknown@formal@none@1@S@It is related to the [[Ryukyuan languages]], but whatever [[Classification of the Japanese language|relationships with other languages]] it may have remain undemonstrated.@@@@1@22@@danf@17-8-2009 10430040@unknown@formal@none@1@S@It is an [[agglutinative language]] and is distinguished by a complex system of [[Honorific speech in Japanese|honorifics]] reflecting the hierarchical nature of Japanese society, with verb forms and particular vocabulary to indicate the relative status of speaker, listener and the third person mentioned in conversation whether he is there or not.@@@@1@51@@danf@17-8-2009 10430050@unknown@formal@none@1@S@The sound inventory of Japanese is relatively small, and it has a lexically distinct [[Japanese pitch accent|pitch-accent]] system.@@@@1@18@@danf@17-8-2009 10430060@unknown@formal@none@1@S@It is a [[mora (linguistics)|mora]]-timed language.@@@@1@6@@danf@17-8-2009 10430070@unknown@formal@none@1@S@The Japanese language is written with a combination of three different types of scripts: [[Chinese characters]] called ''[[kanji]]'' (漢字 / かんじ), and two [[syllabary|syllabic]] scripts made up of modified [[Chinese characters]], ''[[hiragana]]'' (平仮名 / ひらがな) and ''[[katakana]]'' (片仮名 / カタカナ).@@@@1@40@@danf@17-8-2009 10430080@unknown@formal@none@1@S@The [[Latin alphabet]], ''[[rōmaji]]'' (ローマ字), is also often used in modern Japanese, especially for company names and logos, advertising, and when entering Japanese text into a computer.@@@@1@27@@danf@17-8-2009 10430090@unknown@formal@none@1@S@Western style [[Arabic numerals]] are generally used for numbers, but traditional [[Sino-Japanese vocabulary|Sino-Japanese]] numerals are also commonplace.@@@@1@17@@danf@17-8-2009 10430100@unknown@formal@none@1@S@Japanese [[vocabulary]] has been heavily influenced by [[loanword]]s from other languages.@@@@1@11@@danf@17-8-2009 10430110@unknown@formal@none@1@S@A vast number of words were borrowed from [[Chinese language|Chinese]], or created from Chinese models, over a period of at least 1,500 years.@@@@1@23@@danf@17-8-2009 10430120@unknown@formal@none@1@S@Since the late 19th century, Japanese has borrowed a considerable number of words from [[Indo-European languages]], primarily [[English language|English]].@@@@1@19@@danf@17-8-2009 10430130@unknown@formal@none@1@S@Because of the special trade relationship between Japan and first [[Portugal]] in the 16th century, and then mainly the [[Netherlands]] in the 17th century, [[Portuguese language|Portuguese]], [[German language|German]] and [[Dutch language|Dutch]] have also been influential.@@@@1@35@@danf@17-8-2009 10430140@unknown@formal@none@1@S@== Geographic distribution ==@@@@1@4@@danf@17-8-2009 10430150@unknown@formal@none@1@S@Although Japanese is spoken almost exclusively in Japan, it has been and sometimes still is spoken elsewhere.@@@@1@17@@danf@17-8-2009 10430160@unknown@formal@none@1@S@When [[Imperial Japan|Japan]] occupied [[Korea]], [[Taiwan]], parts of the [[Chinese mainland]], and various Pacific islands before and during [[World War II]], locals in [[Greater East Asia Co-Prosperity Sphere|those countries]] were forced to learn Japanese in empire-building programs.@@@@1@37@@danf@17-8-2009 10430170@unknown@formal@none@1@S@As a result, there are many people in these countries who can speak Japanese in addition to the local languages.@@@@1@20@@danf@17-8-2009 10430180@unknown@formal@none@1@S@Japanese emigrant communities (the largest of which are to be found in [[Brazil]]) sometimes employ Japanese as their primary language.@@@@1@20@@danf@17-8-2009 10430190@unknown@formal@none@1@S@Approximately 5% of Hawaii residents speak Japanese, with Japanese ancestry the largest single ancestry in the state (over 24% of the population).@@@@1@22@@danf@17-8-2009 10430200@unknown@formal@none@1@S@Japanese emigrants can also be found in [[Peru]], [[Argentina]], [[Australia]] (especially [[Sydney]], [[Brisbane]], and [[Melbourne]]), the [[United States]] (notably [[California]], where 1.2% of the population has Japanese ancestry, and [[Hawaii]]), and the [[Philippines]] (particularly in [[Davao]] and [[Laguna (province)|Laguna]]).@@@@1@39@@danf@17-8-2009 10430210@unknown@formal@none@1@S@Their descendants, who are known as {{transl|ja|''[[nikkei]]''}} ({{lang|ja|日系}}, literally Japanese descendants), however, rarely speak Japanese fluently after the second generation.@@@@1@20@@danf@17-8-2009 10430220@unknown@formal@none@1@S@There are estimated to be several million non-Japanese studying the language as well.@@@@1@13@@danf@17-8-2009 10430230@unknown@formal@none@1@S@=== Official status ===@@@@1@4@@danf@17-8-2009 10430240@unknown@formal@none@1@S@Japanese is the de facto official language of Japan.@@@@1@9@@danf@17-8-2009 10430250@unknown@formal@none@1@S@There is a form of the language considered standard: {{nihongo|''hyōjungo''|標準語|}} Standard Japanese, or {{nihongo|''kyōtsūgo''|共通語|}} the common language.@@@@1@17@@danf@17-8-2009 10430260@unknown@formal@none@1@S@The meanings of the two terms are almost the same.@@@@1@10@@danf@17-8-2009 10430270@unknown@formal@none@1@S@{{transl|ja|''Hyōjungo''}} or {{transl|ja|''kyōtsūgo''}} is a conception that forms the counterpart of dialect.@@@@1@12@@danf@17-8-2009 10430280@unknown@formal@none@1@S@This normative language was born after the {{nihongo|[[Meiji Restoration]]|明治維新|meiji ishin|1868}} from the language spoken in uptown [[Tokyo]] for communicating necessity.@@@@1@20@@danf@17-8-2009 10430290@unknown@formal@none@1@S@{{transl|ja|''Hyōjungo''}} is taught in schools and used on television and in official communications, and is the version of Japanese discussed in this article.@@@@1@23@@danf@17-8-2009 10430300@unknown@formal@none@1@S@Formerly, standard {{nihongo|Japanese in writing|文語|[[Bungo (Japanese language)|bungo]]|"literary language"}} was different from {{nihongo|colloquial language|口語|[[Kogo (Japanese language)|kōgo]]}}.@@@@1@15@@danf@17-8-2009 10430310@unknown@formal@none@1@S@The two systems have different rules of grammar and some variance in vocabulary.@@@@1@13@@danf@17-8-2009 10430320@unknown@formal@none@1@S@{{transl|ja|''Bungo''}} was the main method of writing Japanese until about 1900; since then {{transl|ja|''kōgo''}} gradually extended its influence and the two methods were both used in writing until the 1940s.@@@@1@30@@danf@17-8-2009 10430330@unknown@formal@none@1@S@{{transl|ja|''Bungo''}} still has some relevance for historians, literary scholars, and lawyers (many Japanese laws that survived [[World War II]] are still written in {{transl|ja|''bungo''}}, although there are ongoing efforts to modernize their language).@@@@1@33@@danf@17-8-2009 10430340@unknown@formal@none@1@S@{{transl|ja|''Kōgo''}} is the predominant method of both speaking and writing Japanese today, although {{transl|ja|''bungo''}} grammar and vocabulary are occasionally used in modern Japanese for effect.@@@@1@25@@danf@17-8-2009 10430350@unknown@formal@none@1@S@=== Dialects ===@@@@1@3@@danf@17-8-2009 10430360@unknown@formal@none@1@S@Dozens of dialects are spoken in Japan.@@@@1@7@@danf@17-8-2009 10430370@unknown@formal@none@1@S@The profusion is due to many factors, including the length of time the [[Japanese Archipelago|archipelago]] has been inhabited, its mountainous island terrain, and Japan's long history of both external and internal isolation.@@@@1@32@@danf@17-8-2009 10430380@unknown@formal@none@1@S@Dialects typically differ in terms of [[Japanese pitch accent|pitch accent]], inflectional [[morphology (linguistics)|morphology]], [[vocabulary]], and particle usage.@@@@1@17@@danf@17-8-2009 10430390@unknown@formal@none@1@S@Some even differ in [[vowel]] and [[consonant]] inventories, although this is uncommon.@@@@1@12@@danf@17-8-2009 10430400@unknown@formal@none@1@S@The main distinction in Japanese accents is between {{nihongo|Tokyo-type|東京式|Tōkyō-shiki}} and {{nihongo|Kyoto-Osaka-type|京阪式|Keihan-shiki}}, though Kyūshū-type dialects form a third, smaller group.@@@@1@19@@danf@17-8-2009 10430410@unknown@formal@none@1@S@Within each type are several subdivisions.@@@@1@6@@danf@17-8-2009 10430420@unknown@formal@none@1@S@Kyoto-Osaka-type dialects are in the central region, with borders roughly formed by [[Toyama Prefecture|Toyama]], [[Kyoto Prefecture|Kyōto]], [[Hyōgo Prefecture|Hyōgo]], and [[Mie Prefecture|Mie]] Prefectures; most [[Shikoku]] dialects are also that type.@@@@1@29@@danf@17-8-2009 10430430@unknown@formal@none@1@S@The final category of dialects are those that are descended from the Eastern dialect of [[Old Japanese]]; these dialects are spoken in [[Hachijōjima|Hachijō-jima island]] and few islands.@@@@1@27@@danf@17-8-2009 10430440@unknown@formal@none@1@S@Dialects from peripheral regions, such as [[Tōhoku Region|Tōhoku]] or [[Tsushima Island|Tsushima]], may be unintelligible to speakers from other parts of the country.@@@@1@22@@danf@17-8-2009 10430450@unknown@formal@none@1@S@The several dialects of [[Kagoshima Prefecture|Kagoshima]] in southern [[Kyūshū]] are famous for being unintelligible not only to speakers of standard Japanese but to speakers of nearby dialects elsewhere in Kyūshū as well.@@@@1@32@@danf@17-8-2009 10430460@unknown@formal@none@1@S@This is probably due in part to the Kagoshima dialects' peculiarities of pronunciation, which include the existence of closed syllables (i.e., syllables that end in a consonant, such as {{IPA|/kob/}} or {{IPA|/koʔ/}} for Standard Japanese {{IPA|/kumo/}} "spider").@@@@1@37@@danf@17-8-2009 10430470@unknown@formal@none@1@S@A dialects group of [[Kansai region|Kansai]] is spoken and known by many Japanese, and [[Osaka]] dialect in particular is associated with comedy (See [[Kansai dialect]]).@@@@1@25@@danf@17-8-2009 10430480@unknown@formal@none@1@S@Dialects of Tōhoku and North [[Kantō region|Kantō]] are associated with typical farmers.@@@@1@12@@danf@17-8-2009 10430490@unknown@formal@none@1@S@The [[Ryūkyūan languages]], spoken in [[Okinawa Prefecture|Okinawa]] and [[Amami Islands]] that are politically part of [[Kagoshima Prefecture|Kagoshima]], are distinct enough to be considered a separate branch of the [[Japonic languages|Japonic]] family.@@@@1@31@@danf@17-8-2009 10430500@unknown@formal@none@1@S@But many Japanese common people tend to consider the Ryūkyūan languages as dialects of Japanese.@@@@1@15@@danf@17-8-2009 10430510@unknown@formal@none@1@S@Not only is each language unintelligible to Japanese speakers, but most are unintelligible to those who speak other Ryūkyūan languages.@@@@1@20@@danf@17-8-2009 10430520@unknown@formal@none@1@S@Recently, Standard Japanese has become prevalent nationwide (including the Ryūkyū islands) due to [[education]], [[mass media]], and increase of mobility networks within Japan, as well as economic integration.@@@@1@28@@danf@17-8-2009 10430530@unknown@formal@none@1@S@== Sounds ==@@@@1@3@@danf@17-8-2009 10430540@unknown@formal@none@1@S@{{IPA notice}}@@@@1@2@@danf@17-8-2009 10430550@unknown@formal@none@1@S@Japanese vowels are "pure" sounds.@@@@1@5@@danf@17-8-2009 10430560@unknown@formal@none@1@S@The only unusual vowel is the high back vowel {{IPA|/ɯ/}} , which is like {{IPA|/u/}}, but [[roundedness|compressed]] instead of rounded.@@@@1@20@@danf@17-8-2009 10430570@unknown@formal@none@1@S@Japanese has five vowels, and [[vowel length]] is phonemic, so each one has both a short and a long version.@@@@1@20@@danf@17-8-2009 10430580@unknown@formal@none@1@S@Some Japanese consonants have several [[allophone]]s, which may give the impression of a larger inventory of sounds.@@@@1@17@@danf@17-8-2009 10430590@unknown@formal@none@1@S@However, some of these allophones have since become phonemic.@@@@1@9@@danf@17-8-2009 10430600@unknown@formal@none@1@S@For example, in the Japanese language up to and including the first half of the twentieth century, the phonemic sequence {{IPA|/ti/}} was [[palatalization|palatalized]] and realized phonetically as {{IPA|[tɕi]}}, approximately ''chi'' ; however, now {{IPA|/ti/}} and {{IPA|/tɕi/}} are distinct, as evidenced by words like ''tī'' {{IPA|[tiː]}} "Western style tea" and ''chii'' {{IPA|[tɕii]}} "social status."@@@@1@53@@danf@17-8-2009 10430610@unknown@formal@none@1@S@The 'r' of the Japanese language (technically a [[lateral consonant|lateral]] [[apical consonant|apical]] postalveolar flap), is of particular interest, sounding to most English speakers to be something between an 'l' and a [[retroflex consonant|retroflex]] 'r' depending on its position in a word.@@@@1@41@@danf@17-8-2009 10430620@unknown@formal@none@1@S@The syllabic structure and the [[phonotactics]] are very simple: the only [[consonant cluster]]s allowed within a syllable consist of one of a subset of the consonants plus {{IPA|/j/}}.@@@@1@28@@danf@17-8-2009 10430630@unknown@formal@none@1@S@These type of clusters only occur in onsets.@@@@1@8@@danf@17-8-2009 10430640@unknown@formal@none@1@S@However, consonant clusters across syllables are allowed as long as the two consonants are a nasal followed by a [[homo-organic]] consonant.@@@@1@21@@danf@17-8-2009 10430650@unknown@formal@none@1@S@[[Consonant length]] (gemination) is also phonemic.@@@@1@6@@danf@17-8-2009 10430660@unknown@formal@none@1@S@== Grammar ==@@@@1@3@@danf@17-8-2009 10430670@unknown@formal@none@1@S@=== Sentence structure ===@@@@1@4@@danf@17-8-2009 10430680@unknown@formal@none@1@S@Japanese word order is classified as [[Subject Object Verb]].@@@@1@9@@danf@17-8-2009 10430690@unknown@formal@none@1@S@However, unlike many [[Indo-European language]]s, Japanese sentences only require that verbs come last for intelligibility.@@@@1@15@@danf@17-8-2009 10430700@unknown@formal@none@1@S@This is because the Japanese [[sentence element]]s are marked with [[Japanese particles|particles]] that identify their grammatical functions.@@@@1@17@@danf@17-8-2009 10430710@unknown@formal@none@1@S@The basic sentence structure is [[topic-comment]].@@@@1@6@@danf@17-8-2009 10430720@unknown@formal@none@1@S@For example, {{transl|ja|''Kochira-wa Tanaka-san desu''}} ({{lang|ja|こちらは田中さんです}}).@@@@1@6@@danf@17-8-2009 10430730@unknown@formal@none@1@S@{{transl|ja|''Kochira''}} ("this") is the topic of the sentence, indicated by the particle ''-wa''.@@@@1@13@@danf@17-8-2009 10430740@unknown@formal@none@1@S@The verb is {{transl|ja|''desu''}}, a [[copula]], commonly translated as "to be" or "it is" (though there are other verbs that can be translated as "to be").@@@@1@26@@danf@17-8-2009 10430750@unknown@formal@none@1@S@As a phrase, {{transl|ja|''Tanaka-san desu''}} is the comment.@@@@1@8@@danf@17-8-2009 10430760@unknown@formal@none@1@S@This sentence loosely translates to "As for this person, (it) is Mr./Mrs./Miss Tanaka."@@@@1@13@@danf@17-8-2009 10430770@unknown@formal@none@1@S@Thus Japanese, like [[Chinese language|Chinese]], [[Korean language|Korean]], and many other Asian languages, is often called a [[topic-prominent language]], which means it has a strong tendency to indicate the topic separately from the subject, and the two do not always coincide.@@@@1@40@@danf@17-8-2009 10430780@unknown@formal@none@1@S@The sentence {{transl|ja|''Zō-wa hana-ga nagai (desu)''}} ({{lang|ja|象は鼻が長いです}}) literally means, "As for elephants, (their) noses are long".@@@@1@15@@danf@17-8-2009 10430790@unknown@formal@none@1@S@The topic is {{transl|ja|''zō''}} "elephant", and the subject is {{transl|ja|''hana''}} "nose".@@@@1@11@@danf@17-8-2009 10430800@unknown@formal@none@1@S@Japanese is a [[pro-drop language]], meaning that the subject or object of a sentence need not be stated if it is obvious from context.@@@@1@24@@danf@17-8-2009 10430810@unknown@formal@none@1@S@In addition, it is commonly felt, particularly in spoken Japanese, that the shorter a sentence is, the better.@@@@1@18@@danf@17-8-2009 10430820@unknown@formal@none@1@S@As a result of this grammatical permissiveness and tendency towards brevity, Japanese speakers tend naturally to omit words from sentences, rather than refer to them with [[pronoun]]s.@@@@1@27@@danf@17-8-2009 10430830@unknown@formal@none@1@S@In the context of the above example, {{transl|ja|''hana-ga nagai''}} would mean "[their] noses are long," while {{transl|ja|''nagai''}} by itself would mean "[they] are long."@@@@1@24@@danf@17-8-2009 10430840@unknown@formal@none@1@S@A single verb can be a complete sentence: {{transl|ja|''Yatta!''}}@@@@1@9@@danf@17-8-2009 10430850@unknown@formal@none@1@S@"[I / we / they / etc] did [it]!".@@@@1@9@@danf@17-8-2009 10430860@unknown@formal@none@1@S@In addition, since adjectives can form the predicate in a Japanese sentence (below), a single adjective can be a complete sentence: {{transl|ja|''Urayamashii!''}}@@@@1@22@@danf@17-8-2009 10430870@unknown@formal@none@1@S@"[I'm] jealous [of it]!".@@@@1@4@@danf@17-8-2009 10430880@unknown@formal@none@1@S@While the language has some words that are typically translated as pronouns, these are not used as frequently as pronouns in some [[Indo-European language]]s, and function differently.@@@@1@27@@danf@17-8-2009 10430890@unknown@formal@none@1@S@Instead, Japanese typically relies on special verb forms and auxiliary verbs to indicate the direction of benefit of an action: "down" to indicate the out-group gives a benefit to the in-group; and "up" to indicate the in-group gives a benefit to the out-group.@@@@1@43@@danf@17-8-2009 10430900@unknown@formal@none@1@S@Here, the in-group includes the speaker and the out-group doesn't, and their boundary depends on context.@@@@1@16@@danf@17-8-2009 10430910@unknown@formal@none@1@S@For example, {{transl|ja|''oshiete moratta''}} (literally, "explained" with a benefit from the out-group to the in-group) means "[he/she/they] explained it to [me/us]".@@@@1@21@@danf@17-8-2009 10430920@unknown@formal@none@1@S@Similarly, {{transl|ja|''oshiete ageta''}} (literally, "explained" with a benefit from the in-group to the out-group) means "[I/we] explained [it] to [him/her/them]".@@@@1@20@@danf@17-8-2009 10430930@unknown@formal@none@1@S@Such beneficiary auxiliary verbs thus serve a function comparable to that of pronouns and prepositions in Indo-European languages to indicate the actor and the recipient of an action.@@@@1@28@@danf@17-8-2009 10430940@unknown@formal@none@1@S@Japanese "pronouns" also function differently from most modern Indo-European pronouns (and more like nouns) in that they can take modifiers as any other noun may.@@@@1@25@@danf@17-8-2009 10430950@unknown@formal@none@1@S@For instance, one cannot say in English:@@@@1@7@@danf@17-8-2009 10430960@unknown@formal@none@1@S@:@@@@1@1@@danf@17-8-2009 10430970@unknown@formal@none@1@S@*The amazed he ran down the street. (grammatically incorrect)@@@@1@9@@danf@17-8-2009 10430980@unknown@formal@none@1@S@But one ''can'' grammatically say essentially the same thing in Japanese:@@@@1@11@@danf@17-8-2009 10430990@unknown@formal@none@1@S@: {{transl|ja|''Odoroita kare-wa michi-o hashitte itta.''}} (grammatically correct)@@@@1@8@@danf@17-8-2009 10431000@unknown@formal@none@1@S@This is partly due to the fact that these words evolved from regular nouns, such as {{transl|ja|''kimi''}} "you" ({{lang|ja|君}} "lord"), {{transl|ja|''anata''}} "you" ({{lang|ja|あなた}} "that side, yonder"), and {{transl|ja|''boku''}} "I" ({{lang|ja|僕}} "servant").@@@@1@31@@danf@17-8-2009 10431010@unknown@formal@none@1@S@This is why some linguists do not classify Japanese "pronouns" as pronouns, but rather as referential nouns.@@@@1@17@@danf@17-8-2009 10431020@unknown@formal@none@1@S@Japanese personal pronouns are generally used only in situations requiring special emphasis as to who is doing what to whom.@@@@1@20@@danf@17-8-2009 10431030@unknown@formal@none@1@S@The choice of words used as pronouns is correlated with the sex of the speaker and the social situation in which they are spoken: men and women alike in a formal situation generally refer to themselves as {{transl|ja|''watashi''}} ({{lang|ja|私}} "private") or {{transl|ja|''watakushi''}} (also {{lang|ja|私}}), while men in rougher or intimate conversation are much more likely to use the word {{transl|ja|''ore''}} ({{lang|ja|俺}} "oneself", "myself") or {{transl|ja|''boku''}}.@@@@1@65@@danf@17-8-2009 10431040@unknown@formal@none@1@S@Similarly, different words such as {{transl|ja|''anata''}}, {{transl|ja|''kimi''}}, and {{transl|ja|''omae''}} ({{lang|ja|お前}}, more formally {{lang|ja|御前}} "the one before me") may be used to refer to a listener depending on the listener's relative social position and the degree of familiarity between the speaker and the listener.@@@@1@43@@danf@17-8-2009 10431050@unknown@formal@none@1@S@When used in different social relationships, the same word may have positive (intimate or respectful) or negative (distant or disrespectful) connotations.@@@@1@21@@danf@17-8-2009 10431060@unknown@formal@none@1@S@Japanese often use titles of the person referred to where pronouns would be used in English.@@@@1@16@@danf@17-8-2009 10431070@unknown@formal@none@1@S@For example, when speaking to one's teacher, it is appropriate to use {{transl|ja|''sensei''}} ({{lang|ja|先生}}, teacher), but inappropriate to use {{transl|ja|''anata''}}.@@@@1@20@@danf@17-8-2009 10431080@unknown@formal@none@1@S@This is because {{transl|ja|''anata''}} is used to refer to people of equal or lower status, and one's teacher has allegedly higher status.@@@@1@22@@danf@17-8-2009 10431090@unknown@formal@none@1@S@For English speaking learners of Japanese, a frequent beginners mistake is to include {{transl|ja|''watashi-wa''}} or {{transl|ja|''anata-wa''}} at the beginning of sentences as one would with ''I'' or ''you'' in English.@@@@1@30@@danf@17-8-2009 10431100@unknown@formal@none@1@S@Though these sentences are not grammatically incorrect, even in formal settings it would be considered unnatural and would equate in English to repeatedly using a noun where a [[pronoun]] would suffice.@@@@1@31@@danf@17-8-2009 10431110@unknown@formal@none@1@S@=== Inflection and conjugation ===@@@@1@5@@danf@17-8-2009 10431120@unknown@formal@none@1@S@Japanese nouns have no grammatical number, gender or article aspect.@@@@1@10@@danf@17-8-2009 10431130@unknown@formal@none@1@S@The noun {{transl|ja|''hon''}} ({{lang|ja|本}}) may refer to a single book or several books; {{transl|ja|''hito''}} ({{lang|ja|人}}) can mean "person" or "people"; and {{transl|ja|''ki''}} ({{lang|ja|木}}) can be "tree" or "trees".@@@@1@28@@danf@17-8-2009 10431140@unknown@formal@none@1@S@Where number is important, it can be indicated by providing a quantity (often with a [[Japanese counter word|counter word]]) or (rarely) by adding a suffix.@@@@1@25@@danf@17-8-2009 10431150@unknown@formal@none@1@S@Words for people are usually understood as singular.@@@@1@8@@danf@17-8-2009 10431160@unknown@formal@none@1@S@Thus {{transl|ja|''Tanaka-san''}} usually means ''Mr./Mrs./Miss. Tanaka''.@@@@1@6@@danf@17-8-2009 10431170@unknown@formal@none@1@S@Words that refer to people and animals can be made to indicate a group of individuals through the addition of a collective suffix (a noun suffix that indicates a group), such as {{transl|ja|''-tachi''}}, but this is not a true plural: the meaning is closer to the English phrase "and company".@@@@1@50@@danf@17-8-2009 10431180@unknown@formal@none@1@S@A group described as {{transl|ja|''Tanaka-san-tachi''}} may include people not named Tanaka.@@@@1@11@@danf@17-8-2009 10431190@unknown@formal@none@1@S@Some Japanese nouns are effectively plural, such as {{transl|ja|''hitobito''}} "people" and {{transl|ja|''wareware''}} "we/us", while the word {{transl|ja|''tomodachi''}} "friend" is considered singular, although plural in form.@@@@1@25@@danf@17-8-2009 10431200@unknown@formal@none@1@S@Verbs are [[Japanese verb conjugations|conjugated]] to show tenses, of which there are two: past and present, or non-past, which is used for the present and the future.@@@@1@27@@danf@17-8-2009 10431210@unknown@formal@none@1@S@For verbs that represent an ongoing process, the ''-te iru'' form indicates a continuous (or progressive) tense.@@@@1@17@@danf@17-8-2009 10431220@unknown@formal@none@1@S@For others that represent a change of state, the {{transl|ja|''-te iru''}} form indicates a perfect tense.@@@@1@16@@danf@17-8-2009 10431230@unknown@formal@none@1@S@For example, {{transl|ja|''kite iru''}} means "He has come (and is still here)", but {{transl|ja|''tabete iru''}} means "He is eating".@@@@1@19@@danf@17-8-2009 10431240@unknown@formal@none@1@S@Questions (both with an interrogative pronoun and yes/no questions) have the same structure as affirmative sentences, but with intonation rising at the end.@@@@1@23@@danf@17-8-2009 10431250@unknown@formal@none@1@S@In the formal register, the question particle {{transl|ja|''-ka''}} is added.@@@@1@10@@danf@17-8-2009 10431260@unknown@formal@none@1@S@For example, {{transl|ja|''Ii desu''}} ({{lang|ja|いいです。}}) "It is OK" becomes {{transl|ja|''Ii desu-ka''}} ({{lang|ja|いいですか?}}) "Is it OK?".@@@@1@15@@danf@17-8-2009 10431270@unknown@formal@none@1@S@In a more informal tone sometimes the particle {{transl|ja|''-no''}} ({{lang|ja|の}}) is added instead to show a personal interest of the speaker: {{transl|ja|''Dōshite konai-no?''}}@@@@1@23@@danf@17-8-2009 10431280@unknown@formal@none@1@S@"Why aren't (you) coming?".@@@@1@4@@danf@17-8-2009 10431290@unknown@formal@none@1@S@Some simple queries are formed simply by mentioning the topic with an interrogative intonation to call for the hearer's attention: {{transl|ja|''Kore-wa?''}}@@@@1@21@@danf@17-8-2009 10431300@unknown@formal@none@1@S@"(What about) this?"; {{transl|ja|''Namae-wa?''}} ({{lang|ja|名前は?}}) "(What's your) name?".@@@@1@8@@danf@17-8-2009 10431310@unknown@formal@none@1@S@Negatives are formed by inflecting the verb.@@@@1@7@@danf@17-8-2009 10431320@unknown@formal@none@1@S@For example, {{transl|ja|''Pan-o taberu''}} ({{lang|ja|パンを食べる。}}) "I will eat bread" or "I eat bread" becomes {{transl|ja|''Pan-o tabenai''}} ({{lang|ja|パンを食べない。}}) "I will not eat bread" or "I do not eat bread".@@@@1@28@@danf@17-8-2009 10431330@unknown@formal@none@1@S@The so-called {{transl|ja|''-te''}} verb form is used for a variety of purposes: either progressive or perfect aspect (see above); combining verbs in a temporal sequence ({{transl|ja|''Asagohan-o tabete sugu dekakeru''}} "I'll eat breakfast and leave at once"), simple commands, conditional statements and permissions ({{transl|ja|''Dekakete-mo ii?''}} "May I go out?"), etc.@@@@1@49@@danf@17-8-2009 10431340@unknown@formal@none@1@S@The word {{transl|ja|''da''}} (plain), {{transl|ja|''desu''}} (polite) is the [[copula]] verb.@@@@1@10@@danf@17-8-2009 10431350@unknown@formal@none@1@S@It corresponds approximately to the English ''be'', but often takes on other roles, including a marker for tense, when the verb is conjugated into its past form {{transl|ja|''datta''}} (plain), {{transl|ja|''deshita''}} (polite).@@@@1@31@@danf@17-8-2009 10431360@unknown@formal@none@1@S@This comes into use because only {{transl|ja|''keiyōshi''}} adjectives and verbs can carry tense in Japanese.@@@@1@15@@danf@17-8-2009 10431370@unknown@formal@none@1@S@Two additional common verbs are used to indicate existence ("there is") or, in some contexts, property: {{transl|ja|''aru''}} (negative {{transl|ja|''nai''}}) and {{transl|ja|''iru''}} (negative {{transl|ja|''inai''}}), for inanimate and animate things, respectively.@@@@1@29@@danf@17-8-2009 10431380@unknown@formal@none@1@S@For example, {{transl|ja|''Neko ga iru''}} "There's a cat", {{transl|ja|''Ii kangae-ga nai''}} "[I] haven't got a good idea".@@@@1@17@@danf@17-8-2009 10431390@unknown@formal@none@1@S@Note that the negative forms of the verbs {{transl|ja|''iru''}} and {{transl|ja|''aru''}} are actually ''i''-adjectives and inflect as such, e.g. {{transl|ja|''Neko ga inakatta''}} "There was no cat".@@@@1@26@@danf@17-8-2009 10431400@unknown@formal@none@1@S@The verb "to do" ({{transl|ja|''suru''}}, polite form {{transl|ja|''shimasu''}}) is often used to make verbs from nouns ({{transl|ja|''ryōri suru''}} "to cook", {{transl|ja|''benkyō suru''}} "to study", etc.) and has been productive in creating modern slang words.@@@@1@34@@danf@17-8-2009 10431410@unknown@formal@none@1@S@Japanese also has a huge number of compound verbs to express concepts that are described in English using a verb and a preposition (e.g. {{transl|ja|''tobidasu''}} "to fly out, to flee," from {{transl|ja|''tobu''}} "to fly, to jump" + {{transl|ja|''dasu''}} "to put out, to emit").@@@@1@43@@danf@17-8-2009 10431420@unknown@formal@none@1@S@There are three types of [[Japanese adjectives|adjective]] (see also [[Japanese adjectives]]):@@@@1@11@@danf@17-8-2009 10431430@unknown@formal@none@1@S@# {{lang|ja|形容詞}} {{transl|ja|''keiyōshi''}}, or {{transl|ja|''i''}} adjectives, which have a [[Japanese verb conjugations|conjugating]] ending {{transl|ja|''i''}} ({{lang|ja|い}}) (such as {{lang|ja|あつい}} {{transl|ja|''atsui''}} "to be hot") which can become past ({{lang|ja|あつかった}} {{transl|ja|''atsukatta''}} "it was hot"), or negative ({{lang|ja|あつくない}} {{transl|ja|''atsuku nai''}} "it is not hot").@@@@1@40@@danf@17-8-2009 10431440@unknown@formal@none@1@S@Note that {{transl|ja|''nai''}} is also an {{transl|ja|''i''}} adjective, which can become past ({{lang|ja|あつくなかった}} {{transl|ja|''atsuku nakatta''}} "it was not hot").@@@@1@19@@danf@17-8-2009 10431450@unknown@formal@none@1@S@#: {{lang|ja|暑い日}} {{transl|ja|''atsui hi''}} "a hot day".@@@@1@7@@danf@17-8-2009 10431460@unknown@formal@none@1@S@# {{lang|ja|形容動詞}} {{transl|ja|''keiyōdōshi''}}, or {{transl|ja|''na''}} adjectives, which are followed by a form of the [[copula]], usually {{transl|ja|''na''}}.@@@@1@17@@danf@17-8-2009 10431470@unknown@formal@none@1@S@For example {{transl|ja|''hen''}} (strange)@@@@1@4@@danf@17-8-2009 10431480@unknown@formal@none@1@S@#: {{lang|ja|変なひと}} {{transl|ja|''hen na hito''}} "a strange person".@@@@1@8@@danf@17-8-2009 10431490@unknown@formal@none@1@S@# {{lang|ja|連体詞}} {{transl|ja|''rentaishi''}}, also called true adjectives, such as {{transl|ja|''ano''}} "that"@@@@1@11@@danf@17-8-2009 10431500@unknown@formal@none@1@S@#: {{lang|ja|あの山}} {{transl|ja|''ano yama''}} "that mountain".@@@@1@6@@danf@17-8-2009 10431510@unknown@formal@none@1@S@Both {{transl|ja|''keiyōshi''}} and {{transl|ja|''keiyōdōshi''}} may [[predicate (grammar)|predicate]] sentences.@@@@1@8@@danf@17-8-2009 10431520@unknown@formal@none@1@S@For example,@@@@1@2@@danf@17-8-2009 10431530@unknown@formal@none@1@S@: {{lang|ja|ご飯が熱い。}} {{transl|ja|''Gohan-ga atsui.''}}@@@@1@4@@danf@17-8-2009 10431540@unknown@formal@none@1@S@"The rice is hot."@@@@1@4@@danf@17-8-2009 10431550@unknown@formal@none@1@S@: {{lang|ja|彼は変だ。}} {{transl|ja|''Kare-wa hen da.''}}@@@@1@5@@danf@17-8-2009 10431560@unknown@formal@none@1@S@"He's strange."@@@@1@2@@danf@17-8-2009 10431570@unknown@formal@none@1@S@Both inflect, though they do not show the full range of conjugation found in true verbs.@@@@1@16@@danf@17-8-2009 10431580@unknown@formal@none@1@S@The {{transl|ja|''rentaishi''}} in Modern Japanese are few in number, and unlike the other words, are limited to directly modifying nouns.@@@@1@20@@danf@17-8-2009 10431590@unknown@formal@none@1@S@They never predicate sentences.@@@@1@4@@danf@17-8-2009 10431600@unknown@formal@none@1@S@Examples include {{transl|ja|''ookina''}} "big", {{transl|ja|''kono''}} "this", {{transl|ja|''iwayuru''}} "so-called" and {{transl|ja|''taishita''}} "amazing".@@@@1@11@@danf@17-8-2009 10431610@unknown@formal@none@1@S@Both {{transl|ja|''keiyōdōshi''}} and {{transl|ja|''keiyōshi''}} form [[adverb]]s, by following with {{transl|ja|''ni''}} in the case of {{transl|ja|''keiyōdōshi''}}:@@@@1@15@@danf@17-8-2009 10431620@unknown@formal@none@1@S@: {{lang|ja|変になる}} {{transl|ja|''hen ni naru''}} "become strange",@@@@1@7@@danf@17-8-2009 10431630@unknown@formal@none@1@S@and by changing {{transl|ja|''i''}} to {{transl|ja|''ku''}} in the case of {{transl|ja|''keiyōshi''}}:@@@@1@11@@danf@17-8-2009 10431640@unknown@formal@none@1@S@: {{lang|ja|熱くなる}} {{transl|ja|''atsuku naru''}} "become hot".@@@@1@6@@danf@17-8-2009 10431650@unknown@formal@none@1@S@The grammatical function of nouns is indicated by [[postposition]]s, also called [[Japanese particles|particles]].@@@@1@13@@danf@17-8-2009 10431660@unknown@formal@none@1@S@These include for example:@@@@1@4@@danf@17-8-2009 10431670@unknown@formal@none@1@S@* '''{{lang|ja|が}} {{transl|ja|''ga''}}''' for the [[nominative case]].@@@@1@7@@danf@17-8-2009 10431680@unknown@formal@none@1@S@Not necessarily a subject.@@@@1@4@@danf@17-8-2009 10431690@unknown@formal@none@1@S@: {{lang|ja|''彼'''が'''やった。''}}{{transl|ja|''Kare '''ga''' yatta.''}}@@@@1@4@@danf@17-8-2009 10431700@unknown@formal@none@1@S@"'''He''' did it."@@@@1@3@@danf@17-8-2009 10431710@unknown@formal@none@1@S@* '''{{lang|ja|に}} {{transl|ja|''ni''}}''' for the [[dative case]].@@@@1@7@@danf@17-8-2009 10431720@unknown@formal@none@1@S@: {{lang|ja|田中さん'''に'''あげて下さい。}} {{transl|ja|''Tanaka-san '''ni''' agete kudasai''}} "Please give it to '''Mr. Tanaka'''."@@@@1@12@@danf@17-8-2009 10431730@unknown@formal@none@1@S@It is also used for the [[lative]] case, indicating a motion to a location.@@@@1@14@@danf@17-8-2009 10431740@unknown@formal@none@1@S@: {{lang|ja|''日本'' '''に'''行きたい。}} {{transl|ja|'''''Nihon''' '''ni''' ikitai''}} "I want to go ''to'' '''Japan'''."@@@@1@12@@danf@17-8-2009 10431750@unknown@formal@none@1@S@* '''{{lang|ja|の}} {{transl|ja|''no''}}''' for the [[genitive case]], or nominalizing phrases.@@@@1@10@@danf@17-8-2009 10431760@unknown@formal@none@1@S@: {{lang|ja|私'''の'''カメラ。}} {{transl|ja|''watashi '''no''' kamera''}} "'''my''' camera"@@@@1@7@@danf@17-8-2009 10431770@unknown@formal@none@1@S@: {{lang|ja|スキーに行く'''の'''が好きです。}} {{transl|ja|''Sukī-ni iku '''no''' ga suki desu''}} "(I) like go'''ing''' skiing."@@@@1@12@@danf@17-8-2009 10431780@unknown@formal@none@1@S@* '''{{lang|ja|を}} {{transl|ja|''o''}}''' for the [[accusative case]].@@@@1@7@@danf@17-8-2009 10431790@unknown@formal@none@1@S@Not necessarily an object.@@@@1@4@@danf@17-8-2009 10431800@unknown@formal@none@1@S@: {{lang|ja|何'''を'''食べますか。}} {{transl|ja|''Nani '''o''' tabemasu ka?''}}@@@@1@6@@danf@17-8-2009 10431810@unknown@formal@none@1@S@"'''What''' will (you) eat?"@@@@1@4@@danf@17-8-2009 10431820@unknown@formal@none@1@S@* '''{{lang|ja|は}} {{transl|ja|''wa''}}''' for the topic.@@@@1@6@@danf@17-8-2009 10431830@unknown@formal@none@1@S@It can co-exist with case markers above except {{transl|ja|''no''}}, and it overrides {{transl|ja|''ga''}} and {{transl|ja|''o''}}.@@@@1@15@@danf@17-8-2009 10431840@unknown@formal@none@1@S@: {{lang|ja|私'''は'''タイ料理がいいです。}} {{transl|ja|''Watashi '''wa''' tai-ryōri ga ii desu.''}}@@@@1@8@@danf@17-8-2009 10431850@unknown@formal@none@1@S@"As for me, Thai food is good."@@@@1@7@@danf@17-8-2009 10431860@unknown@formal@none@1@S@The nominative marker {{transl|ja|''ga''}} after {{transl|ja|''watashi''}} is hidden under {{transl|ja|''wa''}}.@@@@1@10@@danf@17-8-2009 10431865@unknown@formal@none@1@S@(Note that English generally makes no distinction between sentence topic and subject.)@@@@1@12@@danf@17-8-2009 10431867@unknown@formal@none@1@S@Note: The difference between {{transl|ja|'''''wa'''''}} and {{transl|ja|'''''ga'''''}} goes beyond the English distinction between sentence topic and subject.@@@@1@17@@danf@17-8-2009 10431870@unknown@formal@none@1@S@While {{transl|ja|''wa''}} indicates the topic, which the rest of the sentence describes or acts upon, it carries the implication that the subject indicated by {{transl|ja|''wa''}} is not unique, or may be part of a larger group.@@@@1@36@@danf@17-8-2009 10431880@unknown@formal@none@1@S@: {{transl|ja|''Ikeda-san '''wa''' yonjū-ni sai da.''}}@@@@1@6@@danf@17-8-2009 10431890@unknown@formal@none@1@S@"As for Mr. Ikeda, he is forty-two years old."@@@@1@9@@danf@17-8-2009 10431900@unknown@formal@none@1@S@Others in the group may also be of that age.@@@@1@10@@danf@17-8-2009 10431910@unknown@formal@none@1@S@Absence of {{transl|ja|''wa''}} often means the subject is the [[focus (linguistics)|focus]] of the sentence.@@@@1@14@@danf@17-8-2009 10431920@unknown@formal@none@1@S@: {{transl|ja|''Ikeda-san '''ga''' yonjū-ni sai da.''}}@@@@1@6@@danf@17-8-2009 10431930@unknown@formal@none@1@S@"It is Mr. Ikeda who is forty-two years old."@@@@1@9@@danf@17-8-2009 10431940@unknown@formal@none@1@S@This is a reply to an implicit or explicit question who in this group is forty-two years old.@@@@1@18@@danf@17-8-2009 10431950@unknown@formal@none@1@S@=== Politeness ===@@@@1@3@@danf@17-8-2009 10431960@unknown@formal@none@1@S@Unlike most western languages, Japanese has an extensive grammatical system to express politeness and formality.@@@@1@15@@danf@17-8-2009 10431970@unknown@formal@none@1@S@Most relationships are not equal in Japanese [[society]].@@@@1@8@@danf@17-8-2009 10431980@unknown@formal@none@1@S@The differences in social position are determined by a variety of factors including job, age, experience, or even psychological state (e.g., a person asking a favour tends to do so politely).@@@@1@31@@danf@17-8-2009 10431990@unknown@formal@none@1@S@The person in the lower position is expected to use a polite form of speech, whereas the other might use a more plain form.@@@@1@24@@danf@17-8-2009 10432000@unknown@formal@none@1@S@Strangers will also speak to each other politely.@@@@1@8@@danf@17-8-2009 10432010@unknown@formal@none@1@S@Japanese children rarely use polite speech until they are teens, at which point they are expected to begin speaking in a more adult manner.@@@@1@24@@danf@17-8-2009 10432020@unknown@formal@none@1@S@''See [[uchi-soto]]''.@@@@1@2@@danf@17-8-2009 10432030@unknown@formal@none@1@S@Whereas {{transl|ja|''teineigo''}} ({{lang|ja|丁寧語}}) (polite language) is commonly an [[inflection]]al system, {{transl|ja|''sonkeigo''}} ({{lang|ja|尊敬語}}) (respectful language) and {{transl|ja|''kenjōgo''}} ({{lang|ja|謙譲語}}) (humble language) often employ many special honorific and humble alternate verbs: {{transl|ja|''iku''}} "go" becomes {{transl|ja|''ikimasu''}} in polite form, but is replaced by {{transl|ja|''irassharu''}} in honorific speech and {{transl|ja|''ukagau''}} or {{transl|ja|''mairu''}} in humble speech.@@@@1@50@@danf@17-8-2009 10432040@unknown@formal@none@1@S@The difference between honorific and humble speech is particularly pronounced in the Japanese language.@@@@1@14@@danf@17-8-2009 10432050@unknown@formal@none@1@S@Humble language is used to talk about oneself or one's own group (company, family) whilst honorific language is mostly used when describing the interlocutor and his/her group.@@@@1@27@@danf@17-8-2009 10432060@unknown@formal@none@1@S@For example, the {{transl|ja|''-san''}} suffix ("Mr" "Mrs." or "Miss") is an example of honorific language.@@@@1@15@@danf@17-8-2009 10432070@unknown@formal@none@1@S@It is not used to talk about oneself or when talking about someone from one's company to an external person, since the company is the speaker's "group".@@@@1@27@@danf@17-8-2009 10432080@unknown@formal@none@1@S@When speaking directly to one's superior in one's company or when speaking with other employees within one's company about a superior, a Japanese person will use vocabulary and inflections of the honorific register to refer to the in-group superior and his or her speech and actions.@@@@1@46@@danf@17-8-2009 10432090@unknown@formal@none@1@S@When speaking to a person from another company (i.e., a member of an out-group), however, a Japanese person will use the plain or the humble register to refer to the speech and actions of his or her own in-group superiors.@@@@1@40@@danf@17-8-2009 10432100@unknown@formal@none@1@S@In short, the register used in Japanese to refer to the person, speech, or actions of any particular individual varies depending on the relationship (either in-group or out-group) between the speaker and listener, as well as depending on the relative status of the speaker, listener, and third-person referents.@@@@1@48@@danf@17-8-2009 10432110@unknown@formal@none@1@S@For this reason, the Japanese system for explicit indication of social register is known as a system of "relative honorifics."@@@@1@20@@danf@17-8-2009 10432120@unknown@formal@none@1@S@This stands in stark contrast to the [[Korean language|Korean]] system of "absolute honorifics," in which the same register is used to refer to a particular individual (e.g. one's father, one's company president, etc.) in any context regardless of the relationship between the speaker and interlocutor.@@@@1@45@@danf@17-8-2009 10432130@unknown@formal@none@1@S@Thus, polite Korean speech can sound very presumptuous when translated verbatim into Japanese, as in Korean it is acceptable and normal to say things like "Our '''Mr.''' Company-President..." when communicating with a member of an out-group, which would be very inappropriate in a Japanese social context.@@@@1@46@@danf@17-8-2009 10432140@unknown@formal@none@1@S@Most [[noun]]s in the Japanese language may be made polite by the addition of {{transl|ja|''o-''}} or {{transl|ja|''go-''}} as a prefix.@@@@1@20@@danf@17-8-2009 10432145@unknown@formal@none@1@S@{{transl|ja|''o-''}} is generally used for words of native Japanese origin, whereas {{transl|ja|''go-''}} is affixed to words of Chinese derivation.@@@@1@19@@danf@17-8-2009 10432150@unknown@formal@none@1@S@In some cases, the prefix has become a fixed part of the word, and is included even in regular speech, such as {{transl|ja|''gohan''}} 'cooked rice; meal.'@@@@1@26@@danf@17-8-2009 10432160@unknown@formal@none@1@S@Such a construction often indicates deference to either the item's owner or to the object itself.@@@@1@16@@danf@17-8-2009 10432170@unknown@formal@none@1@S@For example, the word {{transl|ja|''tomodachi''}} 'friend,' would become {{transl|ja|''o-tomodachi''}} when referring to the friend of someone of higher status (though mothers often use this form to refer to their children's friends).@@@@1@31@@danf@17-8-2009 10432180@unknown@formal@none@1@S@On the other hand, a polite speaker may sometimes refer to {{transl|ja|''mizu''}} 'water' as {{transl|ja|''o-mizu''}} in order to show politeness.@@@@1@20@@danf@17-8-2009 10432190@unknown@formal@none@1@S@Most Japanese people employ politeness to indicate a lack of familiarity.@@@@1@11@@danf@17-8-2009 10432200@unknown@formal@none@1@S@That is, they use polite forms for new acquaintances, but if a relationship becomes more intimate, they no longer use them.@@@@1@21@@danf@17-8-2009 10432210@unknown@formal@none@1@S@This occurs regardless of age, social class, or gender.@@@@1@9@@danf@17-8-2009 10432220@unknown@formal@none@1@S@== Vocabulary ==@@@@1@3@@danf@17-8-2009 10432230@unknown@formal@none@1@S@The original language of Japan, or at least the original language of a certain population that was ancestral to a significant portion of the historical and present Japanese nation, was the so-called {{transl|ja|''yamato kotoba''}} ({{lang|ja|大和言葉}} or infrequently {{lang|ja|大和詞}}, i.e. "[[Yamato people|Yamato]] words"), which in scholarly contexts is sometimes referred to as {{transl|ja|''wa-go''}} ({{lang|ja|和語}} or rarely {{lang|ja|倭語}}, i.e. the {{transl|ja|"[[Wa (Japan)|Wa]]}} words").@@@@1@61@@danf@17-8-2009 10432240@unknown@formal@none@1@S@In addition to words from this original language, present-day Japanese includes a great number of words that were either borrowed from [[Chinese language|Chinese]] or constructed from Chinese roots following Chinese patterns.@@@@1@31@@danf@17-8-2009 10432250@unknown@formal@none@1@S@These words, known as {{transl|ja|''[[Sino-Japanese vocabulary|kango]]''}} ({{lang|ja|漢語}}), entered the language from the fifth century onwards via contact with Chinese culture.@@@@1@20@@danf@17-8-2009 10432260@unknown@formal@none@1@S@According to a [[Japanese dictionary]] ''Shinsen-kokugojiten'' (新選国語辞典), [[Sino-Japanese vocabulary|Chinese-based words]] comprise 49.1% of the total vocabulary, Wago is 33.8% and other foreign words are 8.8%.@@@@1@25@@danf@17-8-2009 10432270@unknown@formal@none@1@S@Like Latin-derived words in English, {{transl|ja|''[[Sino-Japanese vocabulary|kango]]''}} words typically are perceived as somewhat formal or academic compared to equivalent Yamato words.@@@@1@21@@danf@17-8-2009 10432280@unknown@formal@none@1@S@Indeed, it is generally fair to say that an English word derived from Latin/French roots typically corresponds to a Sino-Japanese word in Japanese, whereas a simpler Anglo-Saxon word would best be translated by a Yamato equivalent.@@@@1@36@@danf@17-8-2009 10432290@unknown@formal@none@1@S@A much smaller number of words has been borrowed from [[Korean language|Korean]] and [[Ainu language|Ainu]].@@@@1@15@@danf@17-8-2009 10432300@unknown@formal@none@1@S@Japan has also borrowed a number of words from other languages, particularly ones of European extraction, which are called {{transl|ja|''[[gairaigo]]''}}.@@@@1@20@@danf@17-8-2009 10432310@unknown@formal@none@1@S@This began with [[Japanese words of Portuguese origin|borrowings from Portuguese]] in the 16th century, followed by borrowing from [[Dutch language|Dutch]] during Japan's [[sakoku|long isolation]] of the [[Edo period]].@@@@1@28@@danf@17-8-2009 10432320@unknown@formal@none@1@S@With the [[Meiji Restoration]] and the reopening of Japan in the 19th century, borrowing occurred from [[German language|German]], [[French language|French]] and [[English language|English]].@@@@1@23@@danf@17-8-2009 10432330@unknown@formal@none@1@S@Currently, words of English origin are the most commonly borrowed.@@@@1@10@@danf@17-8-2009 10432340@unknown@formal@none@1@S@In the Meiji era, the Japanese also coined many neologisms using Chinese roots and morphology to translate Western concepts.@@@@1@19@@danf@17-8-2009 10432350@unknown@formal@none@1@S@The Chinese and Koreans imported many of these pseudo-Chinese words into [[Chinese language|Chinese]], [[Korean language|Korean]], and [[Vietnamese language|Vietnamese]] via their [[kanji]] in the late 19th and early 20th centuries.@@@@1@29@@danf@17-8-2009 10432360@unknown@formal@none@1@S@For example, {{lang|ja|政治}} {{transl|ja|''seiji''}} ("politics"), and {{lang|ja|化学}} {{transl|ja|''kagaku''}} ("chemistry") are words derived from Chinese roots that were first created and used by the Japanese, and only later borrowed into Chinese and other East Asian languages.@@@@1@35@@danf@17-8-2009 10432370@unknown@formal@none@1@S@As a result, Japanese, Chinese, Korean, and Vietnamese share a large common corpus of vocabulary in the same way a large number of Greek- and Latin-derived words are shared among modern European languages, although many academic words formed from such roots were certainly coined by native speakers of other languages, such as English.@@@@1@53@@danf@17-8-2009 10432380@unknown@formal@none@1@S@In the past few decades, {{transl|ja|''[[wasei-eigo]]''}} (made-in-Japan English) has become a prominent phenomenon.@@@@1@13@@danf@17-8-2009 10432390@unknown@formal@none@1@S@Words such as {{transl|ja|''wanpatān''}} {{lang|ja|ワンパターン}} (< ''one'' + ''pattern'', "to be in a rut", "to have a one-track mind") and {{transl|ja|''sukinshippu''}} {{lang|ja|スキンシップ}} (< ''skin'' + ''-ship'', "physical contact"), although coined by compounding English roots, are nonsensical in most non-Japanese contexts; exceptions exist in nearby languages such as Korean however, which often use words such as skinship and rimokon (remote control) in the same way as in Japanese.@@@@1@66@@danf@17-8-2009 10432400@unknown@formal@none@1@S@Additionally, many native Japanese words have become commonplace in English, due to the popularity of many Japanese cultural exports.@@@@1@19@@danf@17-8-2009 10432410@unknown@formal@none@1@S@Words such as [[futon]], [[haiku]], [[judo]], [[kamikaze]], [[karaoke]], [[karate]], [[ninja]], [[origami]], [[rickshaw]] (from {{lang|ja|人力車}} {{transl|ja|''jinrikisha''}}), [[samurai]], [[sayonara]], [[sumo]], [[sushi]], [[tsunami]], [[tycoon]] and many others have become part of the English language.@@@@1@31@@danf@17-8-2009 10432420@unknown@formal@none@1@S@See [[list of English words of Japanese origin]] for more.@@@@1@10@@danf@17-8-2009 10432430@unknown@formal@none@1@S@== Writing system ==@@@@1@4@@danf@17-8-2009 10432440@unknown@formal@none@1@S@Literacy was introduced to Japan in the form of the [[Chinese writing system]], by way of [[Baekje]] before the 5th century.@@@@1@21@@danf@17-8-2009 10432450@unknown@formal@none@1@S@Using this language, the Japanese emperor [[Emperor Yūryaku|Yūryaku]] sent a letter to a Chinese emperor [[Emperor Shun of Liu Song|Liu Song]] in 478 CE.@@@@1@24@@danf@17-8-2009 10432460@unknown@formal@none@1@S@After the ruin of Baekje, Japan invited scholars from China to learn more of the Chinese writing system.@@@@1@18@@danf@17-8-2009 10432470@unknown@formal@none@1@S@Japanese Emperors gave an official rank to Chinese scholars (続守言/薩弘格/袁晋卿) and spread the use of Chinese characters from the 7th century to the 8th century.@@@@1@25@@danf@17-8-2009 10432480@unknown@formal@none@1@S@At first, the Japanese wrote in [[Classical Chinese]], with Japanese names represented by characters used for their meanings and not their sounds.@@@@1@22@@danf@17-8-2009 10432490@unknown@formal@none@1@S@Later, during the seventh century CE, the Chinese-sounding phoneme principle was used to write pure Japanese poetry and prose (comparable to Akkadian's retention of Sumerian cuneiform), but some Japanese words were still written with characters for their meaning and not the original Chinese sound.@@@@1@44@@danf@17-8-2009 10432500@unknown@formal@none@1@S@This is when the history of Japanese as a written language begins in its own right.@@@@1@16@@danf@17-8-2009 10432510@unknown@formal@none@1@S@By this time, the Japanese language was already distinct from the [[Ryukyuan languages]].@@@@1@13@@danf@17-8-2009 10432520@unknown@formal@none@1@S@The Korean settlers and their descendants used Kudara-on or Baekje pronunciation (百済音), which was also called Tsushima-pronunciation (対馬音) or [[Go-on]] (呉音).@@@@1@21@@danf@17-8-2009 10432530@unknown@formal@none@1@S@An example of this mixed style is the [[Kojiki]], which was written in 712 AD.@@@@1@15@@danf@17-8-2009 10432540@unknown@formal@none@1@S@They then started to use Chinese characters to write Japanese in a style known as {{transl|ja|''man'yōgana''}}, a syllabic script which used Chinese characters for their sounds in order to transcribe the words of Japanese speech syllable by syllable.@@@@1@38@@danf@17-8-2009 10432550@unknown@formal@none@1@S@Over time, a writing system evolved.@@@@1@6@@danf@17-8-2009 10432560@unknown@formal@none@1@S@[[Chinese characters]] ([[kanji]]) were used to write either words borrowed from Chinese, or Japanese words with the same or similar meanings.@@@@1@21@@danf@17-8-2009 10432570@unknown@formal@none@1@S@Chinese characters were also used to write grammatical elements, were simplified, and eventually became two syllabic scripts: [[hiragana]] and [[katakana]].@@@@1@20@@danf@17-8-2009 10432580@unknown@formal@none@1@S@Modern Japanese is written in a mixture of three main systems: [[kanji]], characters of Chinese origin used to represent both Chinese [[loanword]]s into Japanese and a number of native Japanese [[morpheme]]s; and two [[syllabary|syllabaries]]: [[hiragana]] and [[katakana]].@@@@1@37@@danf@17-8-2009 10432590@unknown@formal@none@1@S@The [[Latin alphabet]] is also sometimes used.@@@@1@7@@danf@17-8-2009 10432600@unknown@formal@none@1@S@Arabic numerals are much more common than the kanji when used in counting, but kanji numerals are still used in compounds, such as {{lang|ja|統一}} {{transl|ja|''tōitsu''}} ("unification").@@@@1@26@@danf@17-8-2009 10432610@unknown@formal@none@1@S@''[[Hiragana]]'' are used for words without kanji representation, for words no longer written in kanji, and also following kanji to show conjugational endings.@@@@1@23@@danf@17-8-2009 10432620@unknown@formal@none@1@S@Because of the way verbs (and adjectives) in Japanese are [[conjugated]], kanji alone cannot fully convey Japanese tense and mood, as kanji cannot be subject to variation when written without losing its meaning.@@@@1@33@@danf@17-8-2009 10432630@unknown@formal@none@1@S@For this reason, hiragana are suffixed to the ends of kanji to show verb and adjective conjugations.@@@@1@17@@danf@17-8-2009 10432640@unknown@formal@none@1@S@Hiragana used in this way are called [[okurigana]].@@@@1@8@@danf@17-8-2009 10432650@unknown@formal@none@1@S@Hiragana are also written in a superscript called [[furigana]] above or beside a kanji to show the proper reading.@@@@1@19@@danf@17-8-2009 10432660@unknown@formal@none@1@S@This is done to facilitate learning, as well as to clarify particularly old or obscure (or sometimes invented) readings.@@@@1@19@@danf@17-8-2009 10432670@unknown@formal@none@1@S@''[[Katakana]]'', like hiragana, are a syllabary; katakana are primarily used to write foreign words, plant and animal names, and for emphasis.@@@@1@21@@danf@17-8-2009 10432680@unknown@formal@none@1@S@For example "Australia" has been adapted as {{transl|ja|''Ōsutoraria''}} ({{lang|ja|オーストラリア}}), and "supermarket" has been adapted and shortened into {{transl|ja|''sūpā''}} ({{lang|ja|スーパー}}).@@@@1@19@@danf@17-8-2009 10432690@unknown@formal@none@1@S@The [[Latin alphabet]] (in Japanese referred to as [[romaji|''Rōmaji'']] ({{lang|ja|ローマ字}}), literally "Roman letters") is used for some loan words like "CD" and "DVD", and also for some Japanese creations like "Sony".@@@@1@31@@danf@17-8-2009 10432700@unknown@formal@none@1@S@Historically, attempts to limit the number of kanji in use commenced in the mid-19th century, but did not become a matter of government intervention until after Japan's defeat in the Second World War.@@@@1@33@@danf@17-8-2009 10432710@unknown@formal@none@1@S@During the period of post-war occupation (and influenced by the views of some U.S. officials), various schemes including the complete abolition of kanji and exclusive use of rōmaji were considered.@@@@1@30@@danf@17-8-2009 10432720@unknown@formal@none@1@S@The {{transl|ja|''[[jōyō kanji]]''}} ("common use kanji", originally called {{transl|ja|''[[tōyō kanji]]''}} [kanji for general use]) scheme arose as a compromise solution.@@@@1@20@@danf@17-8-2009 10432730@unknown@formal@none@1@S@Japanese students begin to learn kanji from their first year at elementary school.@@@@1@13@@danf@17-8-2009 10432740@unknown@formal@none@1@S@A guideline created by the Japanese Ministry of Education, the list of {{transl|ja|''[[kyōiku kanji]]''}} ("education kanji", a subset of {{transl|ja|''[[jōyō kanji]]''}}), specifies the 1,006 simple characters a child is to learn by the end of sixth grade.@@@@1@37@@danf@17-8-2009 10432750@unknown@formal@none@1@S@Children continue to study another 939 characters in junior high school, covering in total 1,945 {{transl|ja|''[[jōyō kanji]]''}}.@@@@1@17@@danf@17-8-2009 10432760@unknown@formal@none@1@S@The official list of {{transl|ja|''[[jōyō kanji]]''}} was revised several times, but the total number of officially sanctioned characters remained largely unchanged.@@@@1@21@@danf@17-8-2009 10432770@unknown@formal@none@1@S@As for kanji for personal names, the circumstances are somewhat complicated.@@@@1@11@@danf@17-8-2009 10432780@unknown@formal@none@1@S@{{transl|ja|''[[Jōyō kanji]]''}} and {{transl|ja|''[[jinmeiyō kanji]]''}} (an appendix of additional characters for names) are approved for registering personal names.@@@@1@18@@danf@17-8-2009 10432790@unknown@formal@none@1@S@Names containing unapproved characters are denied registration.@@@@1@7@@danf@17-8-2009 10432800@unknown@formal@none@1@S@However, as with the list of {{transl|ja|''[[jōyō kanji]]''}}, criteria for inclusion were often arbitrary and led to many common and popular characters being disapproved for use.@@@@1@26@@danf@17-8-2009 10432810@unknown@formal@none@1@S@Under popular pressure and following a court decision holding the exclusion of common characters unlawful, the list of {{transl|ja|''[[jinmeiyō kanji]]''}} was substantially extended from 92 in 1951 (the year it was first decreed) to 983 in 2004.@@@@1@37@@danf@17-8-2009 10432820@unknown@formal@none@1@S@Furthermore, families whose names are not on these lists were permitted to continue using the older forms.@@@@1@17@@danf@17-8-2009 10432830@unknown@formal@none@1@S@Many writers rely on [[newspaper]] circulation to publish their work with officially sanctioned characters.@@@@1@14@@danf@17-8-2009 10432840@unknown@formal@none@1@S@This distribution method is more efficient than traditional [[pen]] and [[paper]] publications.@@@@1@12@@danf@17-8-2009 10432850@unknown@formal@none@1@S@==Study by non-native speakers==@@@@1@4@@danf@17-8-2009 10432860@unknown@formal@none@1@S@Many major universities throughout the world provide Japanese language courses, and a number of secondary and even primary schools worldwide offer courses in the language.@@@@1@25@@danf@17-8-2009 10432870@unknown@formal@none@1@S@International interest in the Japanese language dates from the 1800s but has become more prevalent following Japan's economic bubble of the 1980s and the global popularity of [[Japanese pop culture]] (such as [[anime]] and [[video games]]) since the 1990s.@@@@1@39@@danf@17-8-2009 10432880@unknown@formal@none@1@S@About 2.3 million people studied the language worldwide in 2003: 900,000 South [[Koreans]], 389,000 [[People's Republic of China|Chinese]], 381,000 [[Australians]], and 140,000 [[United States|Americans]] study Japanese in lower and higher educational institutions.@@@@1@32@@danf@17-8-2009 10432890@unknown@formal@none@1@S@In Japan, more than 90,000 foreign students study at [[List of universities in Japan|Japanese universities]] and Japanese [[language school]]s, including 77,000 Chinese and 15,000 South Koreans in 2003.@@@@1@28@@danf@17-8-2009 10432900@unknown@formal@none@1@S@In addition, local governments and some [[non-profit organisation|NPO]] groups provide free Japanese language classes for foreign residents, including [[Japanese Brazilians]] and foreigners married to Japanese nationals.@@@@1@26@@danf@17-8-2009 10432910@unknown@formal@none@1@S@In the United Kingdom, studies are supported by the [[British Association for Japanese Studies]].@@@@1@14@@danf@17-8-2009 10432920@unknown@formal@none@1@S@In Ireland, Japanese is offered as a language in the [[Leaving Certificate]] in some schools.@@@@1@15@@danf@17-8-2009 10432930@unknown@formal@none@1@S@The Japanese government provides standardised tests to measure spoken and written comprehension of Japanese for second language learners; the most prominent is the [[Japanese Language Proficiency Test]] (JLPT).@@@@1@28@@danf@17-8-2009 10432940@unknown@formal@none@1@S@The Japanese External Trade Organisation [[JETRO]] organises the ''Business Japanese Proficiency Test'' which tests the learner's ability to understand Japanese in a business setting.@@@@1@24@@danf@17-8-2009 10432950@unknown@formal@none@1@S@When learning Japanese in a college setting, students are usually first taught how to pronounce [[romaji]].@@@@1@16@@danf@17-8-2009 10432960@unknown@formal@none@1@S@From that point, they are taught the two main syllabaries, with [[kanji]] usually being introduced in the second semester.@@@@1@19@@danf@17-8-2009 10432970@unknown@formal@none@1@S@Focus is usually first on polite (distal) speech, as students that might interact with native speakers would be expected to use.@@@@1@21@@danf@17-8-2009 10432980@unknown@formal@none@1@S@Casual speech and formal speech usually follow polite speech, as well as the usage of honourifics.@@@@1@16@@danf@17-8-2009 10440010@unknown@formal@none@1@S@
Java (programming language)
@@@@1@3@@danf@17-8-2009 10440020@unknown@formal@none@1@S@'''Java''' is a [[programming language]] originally developed by [[Sun Microsystems]] and released in 1995 as a core component of Sun Microsystems' [[Java (Sun)|Java platform]].@@@@1@24@@danf@17-8-2009 10440030@unknown@formal@none@1@S@The language derives much of its [[Syntax of programming languages|syntax]] from [[C (programming language)|C]] and [[C++]] but has a simpler [[object model]] and fewer low-level facilities.@@@@1@26@@danf@17-8-2009 10440040@unknown@formal@none@1@S@Java applications are typically [[compiler|compiled]] to [[bytecode]] that can run on any [[Java virtual machine]] (JVM) regardless of [[computer architecture]].@@@@1@20@@danf@17-8-2009 10440050@unknown@formal@none@1@S@The original and [[reference implementation]] Java [[compiler]]s, virtual machines, and [[library (computing)|class libraries]] were developed by Sun from 1995.@@@@1@19@@danf@17-8-2009 10440060@unknown@formal@none@1@S@As of May 2007, in compliance with the specifications of the [[Java Community Process]], Sun made available most of their Java technologies as [[free software]] under the [[GNU General Public License]].@@@@1@31@@danf@17-8-2009 10440070@unknown@formal@none@1@S@Others have also developed alternative implementations of these Sun technologies, such as the [[GNU Compiler for Java]] and [[GNU Classpath]].@@@@1@20@@danf@17-8-2009 10440080@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10440090@unknown@formal@none@1@S@The Java language was created by [[James Gosling]] in June 1991 for use in one of his many [[set-top box]] projects.@@@@1@21@@danf@17-8-2009 10440100@unknown@formal@none@1@S@The language was initially called ''Oak'', after an [[oak tree]] that stood outside Gosling's office—and also went by the name ''Green''—and ended up later being renamed to ''Java'', from a list of random words.@@@@1@34@@danf@17-8-2009 10440110@unknown@formal@none@1@S@Gosling's goals were to implement a [[virtual machine]] and a language that had a familiar C/C++ style of notation.@@@@1@19@@danf@17-8-2009 10440120@unknown@formal@none@1@S@The first public implementation was Java 1.0 in 1995.@@@@1@9@@danf@17-8-2009 10440130@unknown@formal@none@1@S@It promised "[[Write once, run anywhere|Write Once, Run Anywhere]]" (WORA), providing no-cost runtimes on popular platforms.@@@@1@16@@danf@17-8-2009 10440140@unknown@formal@none@1@S@It was fairly secure and its security was configurable, allowing network and file access to be restricted.@@@@1@17@@danf@17-8-2009 10440150@unknown@formal@none@1@S@Major web browsers soon incorporated the ability to run secure Java ''[[applet]]s'' within web pages.@@@@1@15@@danf@17-8-2009 10440160@unknown@formal@none@1@S@Java quickly became popular.@@@@1@4@@danf@17-8-2009 10440170@unknown@formal@none@1@S@With the advent of ''Java 2'', new versions had multiple configurations built for different types of platforms.@@@@1@17@@danf@17-8-2009 10440180@unknown@formal@none@1@S@For example, ''[[J2EE]]'' was for enterprise applications and the greatly stripped down version ''[[J2ME]]'' was for mobile applications.@@@@1@18@@danf@17-8-2009 10440190@unknown@formal@none@1@S@''[[J2SE]]'' was the designation for the Standard Edition.@@@@1@8@@danf@17-8-2009 10440200@unknown@formal@none@1@S@In 2006, for marketing purposes, new ''J2'' versions were renamed ''Java EE'', ''Java ME'', and ''Java SE'', respectively.@@@@1@18@@danf@17-8-2009 10440210@unknown@formal@none@1@S@In 1997, Sun Microsystems approached the [[International Organization for Standardization#JTC1|ISO/IEC JTC1 standards body]] and later the [[Ecma International]] to formalize Java, but it soon withdrew from the process.@@@@1@28@@danf@17-8-2009 10440220@unknown@formal@none@1@S@Java remains a [[de facto]] standard that is controlled through the [[Java Community Process]].@@@@1@14@@danf@17-8-2009 10440230@unknown@formal@none@1@S@At one time, Sun made most of its Java implementations available without charge although they were [[proprietary software]].@@@@1@18@@danf@17-8-2009 10440240@unknown@formal@none@1@S@Sun's revenue from Java was generated by the selling of licenses for specialized products such as the Java Enterprise System.@@@@1@20@@danf@17-8-2009 10440250@unknown@formal@none@1@S@Sun distinguishes between its [[Software Development Kit|Software Development Kit (SDK)]] and [[HotSpot|Runtime Environment (JRE)]] that is a subset of the SDK, the primary distinction being that in the JRE, the compiler, utility programs, and many necessary header files are not present.@@@@1@41@@danf@17-8-2009 10440260@unknown@formal@none@1@S@On [[13 November]] [[2006]], Sun released much of Java as [[free software|free]] and [[open-source software|open-source]] software under the terms of the [[GNU General Public License]] (GPL).@@@@1@26@@danf@17-8-2009 10440270@unknown@formal@none@1@S@On [[8 May]] [[2007]] Sun finished the process, making all of Java's core code free and open-source, aside from a small portion of code to which Sun did not hold the copyright.@@@@1@32@@danf@17-8-2009 10440280@unknown@formal@none@1@S@== Philosophy ==@@@@1@3@@danf@17-8-2009 10440290@unknown@formal@none@1@S@=== Primary goals ===@@@@1@4@@danf@17-8-2009 10440300@unknown@formal@none@1@S@There were five primary goals in the creation of the Java language:@@@@1@12@@danf@17-8-2009 10440310@unknown@formal@none@1@S@# It should use the [[object-oriented programming]] methodology.@@@@1@8@@danf@17-8-2009 10440320@unknown@formal@none@1@S@# It should allow the same program to be [[execution (computers)|executed]] on multiple [[operating system]]s.@@@@1@15@@danf@17-8-2009 10440330@unknown@formal@none@1@S@# It should contain built-in support for using [[computer network]]s.@@@@1@10@@danf@17-8-2009 10440340@unknown@formal@none@1@S@# It should be designed to execute code from [[remote procedure call|remote source]]s securely.@@@@1@14@@danf@17-8-2009 10440350@unknown@formal@none@1@S@# It should be easy to use by selecting what were considered the good parts of other object-oriented languages.@@@@1@19@@danf@17-8-2009 10440360@unknown@formal@none@1@S@=== Platform independence ===@@@@1@4@@danf@17-8-2009 10440370@unknown@formal@none@1@S@One characteristic, [[Cross-platform|platform independence]], means that [[computer program|program]]s written in the Java language must run similarly on any supported hardware/operating-system platform.@@@@1@21@@danf@17-8-2009 10440380@unknown@formal@none@1@S@One should be able to write a program once, compile it once, and run it anywhere.@@@@1@16@@danf@17-8-2009 10440390@unknown@formal@none@1@S@This is achieved by most Java [[compiler]]s by compiling the Java language code ''halfway'' (to [[Java bytecode]]) – simplified machine instructions specific to the Java platform.@@@@1@26@@danf@17-8-2009 10440400@unknown@formal@none@1@S@The code is then run on a [[virtual machine]] (VM), a program written in native code on the host hardware that [[Interpreter (computing)|interprets]] and executes generic Java bytecode.@@@@1@28@@danf@17-8-2009 10440410@unknown@formal@none@1@S@(In some JVM versions, bytecode can also be compiled to native code, either before or during program execution, resulting in faster execution.)@@@@1@22@@danf@17-8-2009 10440420@unknown@formal@none@1@S@Further, standardized libraries are provided to allow access to features of the host machines (such as graphics, [[thread (computer science)|threading]] and [[Computer network|networking]]) in unified ways.@@@@1@26@@danf@17-8-2009 10440430@unknown@formal@none@1@S@Note that, although there is an explicit compiling stage, at some point, the Java bytecode is interpreted or converted to native [[machine code]] by the [[Just-in-time compilation|JIT compiler]].@@@@1@28@@danf@17-8-2009 10440440@unknown@formal@none@1@S@The first implementations of the language used an interpreted virtual machine to achieve [[Porting|portability]].@@@@1@14@@danf@17-8-2009 10440450@unknown@formal@none@1@S@These implementations produced programs that ran slower than programs compiled to native executables, for instance written in C or C++, so the language suffered a reputation for poor performance.@@@@1@29@@danf@17-8-2009 10440460@unknown@formal@none@1@S@More recent JVM implementations produce programs that run significantly faster than before, using multiple techniques.@@@@1@15@@danf@17-8-2009 10440470@unknown@formal@none@1@S@One technique, known as ''just-in-time compilation'' (JIT), translates the Java bytecode into native code at the time that the program is run, which results in a program that executes faster than interpreted code but also incurs compilation overhead during execution.@@@@1@40@@danf@17-8-2009 10440480@unknown@formal@none@1@S@More sophisticated VMs use ''[[dynamic recompilation]]'', in which the VM can analyze the behavior of the running program and selectively recompile and optimize critical parts of the program.@@@@1@28@@danf@17-8-2009 10440490@unknown@formal@none@1@S@Dynamic recompilation can achieve optimizations superior to static compilation because the dynamic compiler can base optimizations on knowledge about the runtime environment and the set of loaded classes, and can identify the ''hot spots'' (parts of the program, often inner loops, that take up the most execution time).@@@@1@48@@danf@17-8-2009 10440500@unknown@formal@none@1@S@JIT compilation and dynamic recompilation allow Java programs to take advantage of the speed of native code without losing portability.@@@@1@20@@danf@17-8-2009 10440510@unknown@formal@none@1@S@Another technique, commonly known as ''static compilation'', is to compile directly into native code like a more traditional compiler.@@@@1@19@@danf@17-8-2009 10440520@unknown@formal@none@1@S@Static Java compilers, such as [[GCJ]], translate the Java language code to native [[object code]], removing the intermediate bytecode stage.@@@@1@20@@danf@17-8-2009 10440530@unknown@formal@none@1@S@This achieves good performance compared to interpretation, but at the expense of portability; the output of these compilers can only be run on a single [[Computer architecture|architecture]].@@@@1@27@@danf@17-8-2009 10440540@unknown@formal@none@1@S@Some see avoiding the VM in this manner as defeating the point of developing in Java; however it can be useful to provide both a generic [[bytecode]] version, as well as an optimised native code version of an application.@@@@1@39@@danf@17-8-2009 10440550@unknown@formal@none@1@S@=== Implementations ===@@@@1@3@@danf@17-8-2009 10440560@unknown@formal@none@1@S@Sun Microsystems officially licenses the Java Standard Edition platform for [[Microsoft Windows]], [[Linux]], and [[Solaris (operating system)|Solaris]].@@@@1@17@@danf@17-8-2009 10440570@unknown@formal@none@1@S@Through a network of third-party vendors and licensees, alternative Java environments are available for these and other platforms.@@@@1@18@@danf@17-8-2009 10440580@unknown@formal@none@1@S@To qualify as a certified Java licensee, an implementation on any particular platform must pass a rigorous suite of validation and compatibility tests.@@@@1@23@@danf@17-8-2009 10440590@unknown@formal@none@1@S@This method enables a guaranteed level of compliance and platform through a trusted set of commercial and non-commercial partners.@@@@1@19@@danf@17-8-2009 10440600@unknown@formal@none@1@S@Sun's trademark license for usage of the Java brand insists that all implementations be "compatible".@@@@1@15@@danf@17-8-2009 10440610@unknown@formal@none@1@S@This resulted in a legal dispute with [[Microsoft]] after Sun claimed that the Microsoft implementation did not support the [[Java remote method invocation|RMI]] and [[Java Native Interface|JNI]] interfaces and had added platform-specific features of their own.@@@@1@36@@danf@17-8-2009 10440620@unknown@formal@none@1@S@Sun sued in 1997, and in 2001 won a settlement of $20 million as well as a court order enforcing the terms of the license from Sun.@@@@1@27@@danf@17-8-2009 10440630@unknown@formal@none@1@S@As a result, Microsoft no longer ships Java with [[Microsoft Windows|Windows]], and in recent versions of Windows, [[Internet Explorer]] cannot support Java applets without a third-party plugin.@@@@1@27@@danf@17-8-2009 10440640@unknown@formal@none@1@S@However, Sun and others have made available Java run-time systems at no cost for those and other versions of Windows.@@@@1@20@@danf@17-8-2009 10440650@unknown@formal@none@1@S@Platform-independent Java is essential to the [[Java Enterprise Edition]] strategy, and an even more rigorous validation is required to certify an implementation.@@@@1@22@@danf@17-8-2009 10440660@unknown@formal@none@1@S@This environment enables portable server-side applications, such as [[Web service]]s, [[servlet]]s, and [[Enterprise JavaBean]]s, as well as with [[Embedded system]]s based on [[OSGi]], using [[Embedded Java]] environments.@@@@1@27@@danf@17-8-2009 10440670@unknown@formal@none@1@S@Through the new [[GlassFish]] project, Sun is working to create a fully functional, unified [[open-source]] implementation of the Java EE technologies.@@@@1@21@@danf@17-8-2009 10440680@unknown@formal@none@1@S@=== Automatic memory management ===@@@@1@5@@danf@17-8-2009 10440690@unknown@formal@none@1@S@One of the ideas behind Java's automatic memory management model is that programmers be spared the burden of having to perform manual memory management.@@@@1@24@@danf@17-8-2009 10440700@unknown@formal@none@1@S@In some languages the programmer allocates memory for the creation of objects stored on the [[heap]] and the responsibility of later deallocating that memory also resides with the programmer.@@@@1@29@@danf@17-8-2009 10440710@unknown@formal@none@1@S@If the programmer forgets to deallocate memory or writes code that fails to do so, a [[memory leak]] occurs and the program can consume an arbitrarily large amount of memory.@@@@1@30@@danf@17-8-2009 10440720@unknown@formal@none@1@S@Additionally, if the program attempts to deallocate the region of memory more than once, the result is undefined and the program may become unstable and may crash.@@@@1@27@@danf@17-8-2009 10440730@unknown@formal@none@1@S@Finally, in non garbage collected environments, there is a certain degree of overhead and complexity of user-code to track and finalize allocations.@@@@1@22@@danf@17-8-2009 10440740@unknown@formal@none@1@S@Often developers may box themselves into certain designs to provide reasonable assurances that memory leaks will not occur.@@@@1@18@@danf@17-8-2009 10440750@unknown@formal@none@1@S@In Java, this potential problem is avoided by [[automatic garbage collection]].@@@@1@11@@danf@17-8-2009 10440760@unknown@formal@none@1@S@The programmer determines when objects are created, and the Java runtime is responsible for managing the [[object lifetime|object's lifecycle]].@@@@1@19@@danf@17-8-2009 10440770@unknown@formal@none@1@S@The program or other objects can reference an object by holding a reference to it (which, from a low-level point of view, is its address on the heap).@@@@1@28@@danf@17-8-2009 10440780@unknown@formal@none@1@S@When no references to an object remain, the [[unreachable object]] is eligible for release by the Java garbage collector - it may be freed automatically by the garbage collector at any time.@@@@1@32@@danf@17-8-2009 10440790@unknown@formal@none@1@S@Memory leaks may still occur if a programmer's code holds a reference to an object that is no longer needed—in other words, they can still occur but at higher conceptual levels.@@@@1@31@@danf@17-8-2009 10440800@unknown@formal@none@1@S@The use of garbage collection in a language can also affect programming paradigms.@@@@1@13@@danf@17-8-2009 10440810@unknown@formal@none@1@S@If, for example, the developer assumes that the cost of memory allocation/recollection is low, they may choose to more freely construct objects instead of pre-initializing, holding and reusing them.@@@@1@29@@danf@17-8-2009 10440820@unknown@formal@none@1@S@With the small cost of potential performance penalties (inner-loop construction of large/complex objects), this facilitates thread-isolation (no need to synchronize as different threads work on different object instances) and data-hiding.@@@@1@30@@danf@17-8-2009 10440830@unknown@formal@none@1@S@The use of transient immutable value-objects minimizes side-effect programming.@@@@1@9@@danf@17-8-2009 10440840@unknown@formal@none@1@S@Comparing Java and [[C++]], it is possible in C++ to implement similar functionality (for example, a memory management model for specific classes can be designed in C++ to improve speed and lower memory fragmentation considerably), with the possible cost of adding comparable runtime overhead to that of Java's garbage collector, and of added development time and application complexity if one favors manual implementation over using an existing third-party library.@@@@1@69@@danf@17-8-2009 10440850@unknown@formal@none@1@S@In Java, garbage collection is built-in and virtually invisible to the developer.@@@@1@12@@danf@17-8-2009 10440860@unknown@formal@none@1@S@That is, developers may have no notion of when garbage collection will take place as it may not necessarily correlate with any actions being explicitly performed by the code they write.@@@@1@31@@danf@17-8-2009 10440870@unknown@formal@none@1@S@Depending on intended application, this can be beneficial or disadvantageous: the programmer is freed from performing low-level tasks, but at the same time loses the option of writing lower level code.@@@@1@31@@danf@17-8-2009 10440880@unknown@formal@none@1@S@Additionally, the garbage collection capability demands some attention to tuning the JVM, as large heaps will cause apparently random stalls in performance.@@@@1@22@@danf@17-8-2009 10440890@unknown@formal@none@1@S@Java does not support [[pointer (computing)|pointer arithmetic]] as is supported in, for example, C++.@@@@1@14@@danf@17-8-2009 10440900@unknown@formal@none@1@S@This is because the garbage collector may relocate referenced objects, invalidating such pointers.@@@@1@13@@danf@17-8-2009 10440910@unknown@formal@none@1@S@Another reason that Java forbids this is that type safety and security can no longer be guaranteed if arbitrary manipulation of pointers is allowed.@@@@1@24@@danf@17-8-2009 10440920@unknown@formal@none@1@S@== Syntax ==@@@@1@3@@danf@17-8-2009 10440930@unknown@formal@none@1@S@The syntax of Java is largely derived from [[C++]].@@@@1@9@@danf@17-8-2009 10440940@unknown@formal@none@1@S@Unlike C++, which combines the syntax for structured, generic, and object-oriented programming, Java was built exclusively as an object oriented language.@@@@1@21@@danf@17-8-2009 10440950@unknown@formal@none@1@S@As a result, almost everything is an object and all code is written inside a class.@@@@1@16@@danf@17-8-2009 10440960@unknown@formal@none@1@S@The exceptions are the intrinsic data types (ordinal and real numbers, boolean values, and characters), which are not classes for performance reasons.@@@@1@22@@danf@17-8-2009 10440970@unknown@formal@none@1@S@=== Hello, world program ===@@@@1@5@@danf@17-8-2009 10440980@unknown@formal@none@1@S@This is a minimal [[Hello world program]] in Java with [[syntax highlighting]]:@@@@1@12@@danf@17-8-2009 10440990@unknown@formal@none@1@S@ // Hello.java public class Hello { public static void main(String[] args) { System.out.println("Hello, world!"); } } @@@@1@19@@danf@17-8-2009 10441000@unknown@formal@none@1@S@To execute a Java program, the code is saved as a file named Hello.java.@@@@1@14@@danf@17-8-2009 10441010@unknown@formal@none@1@S@It must first be compiled into bytecode using a [[Java compiler]], which produces a file named Hello.class.@@@@1@17@@danf@17-8-2009 10441020@unknown@formal@none@1@S@This class is then ''launched''.@@@@1@5@@danf@17-8-2009 10441030@unknown@formal@none@1@S@The above example merits a bit of explanation.@@@@1@8@@danf@17-8-2009 10441040@unknown@formal@none@1@S@* All executable statements in Java are written inside a class, including stand-alone programs.@@@@1@14@@danf@17-8-2009 10441050@unknown@formal@none@1@S@* Source files are by convention named the same as the class they contain, appending the mandatory suffix ''.java''.@@@@1@19@@danf@17-8-2009 10441060@unknown@formal@none@1@S@A '''class''' that is declared '''public''' is required to follow this convention.@@@@1@12@@danf@17-8-2009 10441070@unknown@formal@none@1@S@(In this case, the class '''Hello''' is public, therefore the source must be stored in a file called ''Hello.java'').@@@@1@19@@danf@17-8-2009 10441080@unknown@formal@none@1@S@* The compiler will generate a class file for each class defined in the source file.@@@@1@16@@danf@17-8-2009 10441090@unknown@formal@none@1@S@The name of the class file is the name of the class, with ''.class'' appended.@@@@1@15@@danf@17-8-2009 10441100@unknown@formal@none@1@S@For class file generation, anonymous classes are treated as if their name was the concatenation of the name of their enclosing class, a ''$'', and an integer.@@@@1@27@@danf@17-8-2009 10441110@unknown@formal@none@1@S@* The [[Java keywords|keyword]] '''public''' denotes that a method can be called from code in other classes, or that a class may be used by classes outside the class hierarchy.@@@@1@30@@danf@17-8-2009 10441120@unknown@formal@none@1@S@* The keyword '''static''' indicates that the method is a [[class method|static method]], associated with the class rather than object instances.@@@@1@21@@danf@17-8-2009 10441130@unknown@formal@none@1@S@* The keyword '''void''' indicates that the main method does not return any value to the caller.@@@@1@17@@danf@17-8-2009 10441140@unknown@formal@none@1@S@* The method name "main" is not a keyword in the Java language.@@@@1@13@@danf@17-8-2009 10441150@unknown@formal@none@1@S@It is simply the name of the method the Java launcher calls to pass control to the program.@@@@1@18@@danf@17-8-2009 10441160@unknown@formal@none@1@S@Java classes that run in managed environments such as applets and [[Enterprise Java Beans]] do not use or need a main() method.@@@@1@22@@danf@17-8-2009 10441170@unknown@formal@none@1@S@* The main method must accept an [[array]] of '''{{Javadoc:SE|java/lang|String}}''' objects.@@@@1@11@@danf@17-8-2009 10441180@unknown@formal@none@1@S@By convention, it is referenced as '''args''' although any other legal identifier name can be used.@@@@1@16@@danf@17-8-2009 10441190@unknown@formal@none@1@S@Since Java 5, the main method can also use [[varargs|variable arguments]], in the form of public static void main(String... args), allowing the main method to be invoked with an arbitrary number of String arguments.@@@@1@34@@danf@17-8-2009 10441200@unknown@formal@none@1@S@The effect of this alternate declaration is semantically identical (the args parameter is still an array of String objects), but allows an alternate syntax for creating and passing the array.@@@@1@30@@danf@17-8-2009 10441210@unknown@formal@none@1@S@* The Java launcher launches Java by loading a given class (specified on the command line) and starting its public static void main(String[]) method.@@@@1@24@@danf@17-8-2009 10441220@unknown@formal@none@1@S@Stand-alone programs must declare this method explicitly.@@@@1@7@@danf@17-8-2009 10441230@unknown@formal@none@1@S@The String[] args parameter is an [[array]] of {{Javadoc:SE|java/lang|String}} objects containing any arguments passed to the class.@@@@1@17@@danf@17-8-2009 10441240@unknown@formal@none@1@S@The parameters to main are often passed by means of a [[command line]].@@@@1@13@@danf@17-8-2009 10441250@unknown@formal@none@1@S@* The printing facility is part of the Java standard library: The '''{{Javadoc:SE|java/lang|System}}''' class defines a public static field called '''{{Javadoc:SE|name=out|java/lang|System|out}}'''.@@@@1@21@@danf@17-8-2009 10441260@unknown@formal@none@1@S@The out object is an instance of the {{Javadoc:SE|java/io|PrintStream}} class and provides the method '''{{Javadoc:SE|name=println(String)|java/io|PrintStream|println(java.lang.String)}}''' for displaying data to the screen while creating a new line ([[standard streams|standard out]]).@@@@1@29@@danf@17-8-2009 10441270@unknown@formal@none@1@S@=== A more comprehensive example ===@@@@1@6@@danf@17-8-2009 10441280@unknown@formal@none@1@S@ // OddEven.java import javax.swing.JOptionPane;public class OddEven { public static void main(String[] args) { // This is the main method.It gets called when this class is run through a Java interpreter.OddEven number = new OddEven(); /* This line of code creates a new instance of this class called "number" and * initializes it, and the next line of code calls the "showDialog()" method, * which brings up a prompt to ask you for a number */ number.showDialog(); } private int input; // A whole number("int" means integer) // "input" is the number that the user gives to the computer public OddEven() { /* This is the constructor method.It gets called when an object of the OddEven type * is created. */ } public void showDialog() { try /* This makes sure nothing goes wrong.If something does, * the interpreter skips to "catch" to see what it should do. */ { input = Integer.parseInt(JOptionPane.showInputDialog("Please Enter A Number")); calculate(); /* * The code above brings up a JOptionPane, which is a dialog box * The String returned by the "showInputDialog()" method is converted into * an integer, making the program treat it as a number instead of a word. * After that, this method calls a second method, calculate() that will * display either "Even" or "Odd." */ } catch (NumberFormatException e) /* This means that there was a problem with the format of the number * (Like if someone were to type in 'Hello world' instead of a number). */ { System.err.println("ERROR: Invalid input.Please type in a numerical value."); } } private void calculate() { if (input % 2 == 0) System.out.println("Even"); /* When this gets called, it sends a message to the interpreter. * The interpreter usually shows it on the command prompt (For Windows users) * or the terminal (For Linux users).(Assuming it's open) */ else System.out.println("Odd"); } } @@@@1@312@@danf@17-8-2009 10441290@unknown@formal@none@1@S@* The '''[[Java keywords#import|import]]''' statement imports the '''{{Javadoc:SE|javax/swing|JOptionPane}}''' class from the '''{{Javadoc:SE|package=javax.swing|javax/swing}}''' package.@@@@1@13@@danf@17-8-2009 10441300@unknown@formal@none@1@S@* The '''OddEven''' class declares a single '''[[Java keywords#private|private]]''' [[field (computer science)|field]] of type '''int''' named '''input'''.@@@@1@17@@danf@17-8-2009 10441310@unknown@formal@none@1@S@Every instance of the OddEven class has its own copy of the input field.@@@@1@14@@danf@17-8-2009 10441320@unknown@formal@none@1@S@The private declaration means that no other class can access (read or write) the input field.@@@@1@16@@danf@17-8-2009 10441330@unknown@formal@none@1@S@* '''OddEven()''' is a '''public''' [[constructor (computer science)|constructor]].@@@@1@8@@danf@17-8-2009 10441340@unknown@formal@none@1@S@Constructors have the same name as the enclosing class they are declared in, and unlike a method, have no [[return type]].@@@@1@21@@danf@17-8-2009 10441350@unknown@formal@none@1@S@A constructor is used to initialize an [[object (computer science)|object]] that is a newly created instance of the class.@@@@1@19@@danf@17-8-2009 10441360@unknown@formal@none@1@S@The dialog returns a String that is converted to an int by the '''{{Javadoc:SE|java/lang|Integer|parseInt(String)}}''' method.@@@@1@15@@danf@17-8-2009 10441370@unknown@formal@none@1@S@* The '''calculate()''' method is declared without the static keyword.@@@@1@10@@danf@17-8-2009 10441380@unknown@formal@none@1@S@This means that the method is invoked using a specific instance of the OddEven class.@@@@1@15@@danf@17-8-2009 10441390@unknown@formal@none@1@S@(The [[reference (computer science)|reference]] used to invoke the method is passed as an undeclared parameter of type OddEven named '''[[Java keywords#this|this]]'''.)@@@@1@21@@danf@17-8-2009 10441400@unknown@formal@none@1@S@The method tests the expression input % 2 == 0 using the '''[[Java keywords#if|if]]''' keyword to see if the remainder of dividing the input field belonging to the instance of the class by two is zero.@@@@1@36@@danf@17-8-2009 10441410@unknown@formal@none@1@S@If this expression is true, then it prints '''Even'''; if this expression is false it prints '''Odd'''.@@@@1@17@@danf@17-8-2009 10441420@unknown@formal@none@1@S@(The input field can be equivalently accessed as this.input, which explicitly uses the undeclared this parameter.)@@@@1@16@@danf@17-8-2009 10441430@unknown@formal@none@1@S@* '''OddEven number = new OddEven();''' declares a local object [[reference (computer science)|reference]] variable in the main method named number.@@@@1@20@@danf@17-8-2009 10441440@unknown@formal@none@1@S@This variable can hold a reference to an object of type OddEven.@@@@1@12@@danf@17-8-2009 10441450@unknown@formal@none@1@S@The declaration initializes number by first creating an instance of the OddEven class, using the '''[[Java keywords#new|new]]''' keyword and the OddEven() constructor, and then assigning this instance to the variable.@@@@1@30@@danf@17-8-2009 10441460@unknown@formal@none@1@S@* The statement '''number.showDialog();''' calls the calculate method.@@@@1@8@@danf@17-8-2009 10441470@unknown@formal@none@1@S@The instance of OddEven object referenced by the number [[local variable]] is used to invoke the method and passed as the undeclared this parameter to the calculate method.@@@@1@28@@danf@17-8-2009 10441480@unknown@formal@none@1@S@* For simplicity, [[error handling]] has been ignored in this example.@@@@1@11@@danf@17-8-2009 10441490@unknown@formal@none@1@S@Entering a value that is not a number will cause the program to crash.@@@@1@14@@danf@17-8-2009 10441500@unknown@formal@none@1@S@This can be avoided by catching and handling the {{Javadoc:SE|java/lang|NumberFormatException}} thrown by Integer.parseInt(String).@@@@1@13@@danf@17-8-2009 10441510@unknown@formal@none@1@S@=== Applet ===@@@@1@3@@danf@17-8-2009 10441520@unknown@formal@none@1@S@Java applets are programs that are embedded in other applications, typically in a Web page displayed in a [[Web browser]].@@@@1@20@@danf@17-8-2009 10441530@unknown@formal@none@1@S@ // Hello.java import java.applet.Applet; import java.awt.Graphics;public class Hello extends Applet { public void paint(Graphics gc) { gc.drawString("Hello, world!", 65, 95); } } @@@@1@25@@danf@17-8-2009 10441540@unknown@formal@none@1@S@The '''import''' statements direct the [[Java compiler]] to include the '''{{Javadoc:SE|package=java.applet|java/applet|Applet}}''' and '''{{Javadoc:SE|package=java.awt|java/awt|Graphics}}''' classes in the compilation.@@@@1@17@@danf@17-8-2009 10441550@unknown@formal@none@1@S@The import statement allows these classes to be referenced in the [[source code]] using the ''simple class name'' (i.e. Applet) instead of the ''fully qualified class name'' (i.e. java.applet.Applet).@@@@1@29@@danf@17-8-2009 10441560@unknown@formal@none@1@S@The Hello class '''extends''' ([[subclass (computer science)|subclasses]]) the '''Applet''' class; the Applet class provides the framework for the host application to display and control the [[Object lifetime|lifecycle]] of the applet.@@@@1@30@@danf@17-8-2009 10441570@unknown@formal@none@1@S@The Applet class is an [[Abstract Windowing Toolkit]] (AWT) {{Javadoc:SE|java/awt|Component}}, which provides the applet with the capability to display a [[graphical user interface]] (GUI) and respond to user [[event-driven programming|events]].@@@@1@30@@danf@17-8-2009 10441580@unknown@formal@none@1@S@The Hello class [[method overriding (programming)|overrides]] the '''{{Javadoc:SE|name=paint(Graphics)|java/awt|Container|paint(java.awt.Graphics)}}''' method inherited from the {{Javadoc:SE|java/awt|Container}} [[superclass (computer science)|superclass]] to provide the code to display the applet.@@@@1@24@@danf@17-8-2009 10441590@unknown@formal@none@1@S@The paint() method is passed a '''Graphics''' object that contains the graphic context used to display the applet.@@@@1@18@@danf@17-8-2009 10441600@unknown@formal@none@1@S@The paint() method calls the graphic context '''{{Javadoc:SE|name=drawString(String, int, int)|java/awt|Graphics|drawString(java.lang.String,%20int,%20int)}}''' method to display the '''"Hello, world!"''' string at a [[pixel]] offset of ('''65, 95''') from the upper-left corner in the applet's display.@@@@1@32@@danf@17-8-2009 10441610@unknown@formal@none@1@S@ Hello World Applet @@@@1@16@@danf@17-8-2009 10441620@unknown@formal@none@1@S@An applet is placed in an [[HTML]] document using the '''''' [[HTML element]].@@@@1@13@@danf@17-8-2009 10441630@unknown@formal@none@1@S@The applet tag has three attributes set: '''code="Hello"''' specifies the name of the Applet class and '''width="200" height="200"''' sets the pixel width and height of the applet.@@@@1@27@@danf@17-8-2009 10441640@unknown@formal@none@1@S@Applets may also be embedded in HTML using either the object or embed element, although support for these elements by Web browsers is inconsistent.@@@@1@24@@danf@17-8-2009 10441650@unknown@formal@none@1@S@However, the applet tag is deprecated, so the object tag is preferred where supported.@@@@1@14@@danf@17-8-2009 10441660@unknown@formal@none@1@S@The host application, typically a Web browser, instantiates the '''Hello''' applet and creates an {{Javadoc:SE|java/applet|AppletContext}} for the applet.@@@@1@18@@danf@17-8-2009 10441670@unknown@formal@none@1@S@Once the applet has initialized itself, it is added to the AWT display hierarchy.@@@@1@14@@danf@17-8-2009 10441680@unknown@formal@none@1@S@The paint method is called by the AWT [[event dispatching thread]] whenever the display needs the applet to draw itself.@@@@1@20@@danf@17-8-2009 10441690@unknown@formal@none@1@S@=== '''Servlet''' ===@@@@1@3@@danf@17-8-2009 10441700@unknown@formal@none@1@S@Java Servlet technology provides Web developers with a simple, consistent mechanism for extending the functionality of a Web server and for accessing existing business systems.@@@@1@25@@danf@17-8-2009 10441710@unknown@formal@none@1@S@Servlets are [[server-side]] Java EE components that generate responses (typically [[HTML]] pages) to requests (typically [[HTTP]] requests) from [[client (computing)|client]]s.@@@@1@20@@danf@17-8-2009 10441720@unknown@formal@none@1@S@A servlet can almost be thought of as an applet that runs on the server side—without a face.@@@@1@18@@danf@17-8-2009 10441730@unknown@formal@none@1@S@ // Hello.java import java.io.*; import javax.servlet.*;public class Hello extends GenericServlet { public void service(ServletRequest request, ServletResponse response) throws ServletException, IOException { response.setContentType("text/html"); final PrintWriter pw = response.getWriter(); pw.println("Hello, world!"); pw.close(); } } @@@@1@35@@danf@17-8-2009 10441740@unknown@formal@none@1@S@The '''import''' statements direct the Java compiler to include all of the public classes and [[interface (Java)|interfaces]] from the '''{{Javadoc:SE|package=java.io|java/io}}''' and '''{{Javadoc:EE|package=javax.servlet|javax/servlet}}''' [[Java package|packages]] in the compilation.@@@@1@27@@danf@17-8-2009 10441750@unknown@formal@none@1@S@The '''Hello''' class '''extends''' the '''{{Javadoc:EE|javax/servlet|GenericServlet}}''' class; the GenericServlet class provides the interface for the [[server (computing)|server]] to forward requests to the servlet and control the servlet's lifecycle.@@@@1@28@@danf@17-8-2009 10441760@unknown@formal@none@1@S@The Hello class overrides the '''{{Javadoc:EE|name=service(ServletRequest, ServletResponse)|javax/servlet|Servlet|service(javax.servlet.ServletRequest,javax.servlet.ServletResponse)}}''' method defined by the {{Javadoc:EE|javax/servlet|Servlet}} [[Interface (Java)|interface]] to provide the code for the service request handler.@@@@1@23@@danf@17-8-2009 10441770@unknown@formal@none@1@S@The service() method is passed a '''{{Javadoc:EE|javax/servlet|ServletRequest}}''' object that contains the request from the client and a '''{{Javadoc:EE|javax/servlet|ServletResponse}}''' object used to create the response returned to the client.@@@@1@28@@danf@17-8-2009 10441780@unknown@formal@none@1@S@The service() method declares that it '''throws''' the [[exception handling|exceptions]] {{Javadoc:EE|javax/servlet|ServletException}} and {{Javadoc:SE|java/io|IOException}} if a problem prevents it from responding to the request.@@@@1@23@@danf@17-8-2009 10441790@unknown@formal@none@1@S@The '''{{Javadoc:EE|name=setContentType(String)|javax/servlet|ServletResponse|setContentType(java.lang.String)}}''' method in the response object is called to set the [[MIME]] content type of the returned data to '''"text/html"'''.@@@@1@21@@danf@17-8-2009 10441800@unknown@formal@none@1@S@The '''{{Javadoc:EE|name=getWriter()|javax/servlet|ServletResponse|getWriter()}}''' method in the response returns a '''{{Javadoc:SE|java/io|PrintWriter}}''' object that is used to write the data that is sent to the client.@@@@1@23@@danf@17-8-2009 10441810@unknown@formal@none@1@S@The '''{{Javadoc:SE|name=println(String)|java/io|PrintWriter|println(java.lang.String)}}''' method is called to write the '''"Hello, world!"''' string to the response and then the '''{{Javadoc:SE|name=close()|java/io|PrintWriter|close()}}''' method is called to close the print writer, which causes the data that has been written to the stream to be returned to the client.@@@@1@43@@danf@17-8-2009 10441820@unknown@formal@none@1@S@=== JavaServer Page ===@@@@1@4@@danf@17-8-2009 10441830@unknown@formal@none@1@S@JavaServer Pages (JSPs) are [[server-side]] Java EE components that generate responses, typically [[HTML]] pages, to [[HTTP]] requests from [[client (computing)|client]]s.@@@@1@20@@danf@17-8-2009 10441840@unknown@formal@none@1@S@JSPs embed Java code in an HTML page by using the special [[delimiter]]s <% and %>.@@@@1@18@@danf@17-8-2009 10441850@unknown@formal@none@1@S@A JSP is compiled to a Java ''servlet'', a Java application in its own right, the first time it is accessed.@@@@1@21@@danf@17-8-2009 10441860@unknown@formal@none@1@S@After that, the generated servlet creates the response.@@@@1@8@@danf@17-8-2009 10441870@unknown@formal@none@1@S@=== Swing application ===@@@@1@4@@danf@17-8-2009 10441880@unknown@formal@none@1@S@Swing is a graphical user interface [[library (computer science)|library]] for the Java SE platform.@@@@1@14@@danf@17-8-2009 10441890@unknown@formal@none@1@S@This example Swing application creates a single window with "Hello, world!" inside:@@@@1@12@@danf@17-8-2009 10441900@unknown@formal@none@1@S@ // Hello.java (Java SE 5) import java.awt.BorderLayout; import javax.swing.*;public class Hello extends JFrame { public Hello() { super("hello"); setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE); setLayout(new BorderLayout()); add(new JLabel("Hello, world!")); pack(); }public static void main(String[] args) { new Hello().setVisible(true); } } @@@@1@38@@danf@17-8-2009 10441910@unknown@formal@none@1@S@The first '''import''' statement directs the Java compiler to include the {{Javadoc:SE|java/awt|BorderLayout}} class from the {{Javadoc:SE|package=java.awt|java/awt}} package in the compilation; the second '''import''' includes all of the public classes and interfaces from the '''{{Javadoc:SE|package=javax.swing|javax/swing}}''' package.@@@@1@35@@danf@17-8-2009 10441920@unknown@formal@none@1@S@The '''Hello''' class '''extends''' the '''{{Javadoc:SE|javax/swing|JFrame}}''' class; the JFrame class implements a [[window (computing)|window]] with a [[title bar]] and a close [[Widget (computing)|control]].@@@@1@23@@danf@17-8-2009 10441930@unknown@formal@none@1@S@The '''Hello()''' [[constructor (computer science)|constructor]] initializes the frame by first calling the superclass constructor, passing the parameter "hello", which is used as the window's title.@@@@1@25@@danf@17-8-2009 10441940@unknown@formal@none@1@S@It then calls the '''{{Javadoc:SE|name=setDefaultCloseOperation(int)|javax/swing|JFrame|setDefaultCloseOperation(int)}}''' method inherited from JFrame to set the default operation when the close control on the title bar is selected to '''{{Javadoc:SE|javax/swing|WindowConstants|EXIT_ON_CLOSE}}''' — this causes the JFrame to be disposed of when the frame is closed (as opposed to merely hidden), which allows the JVM to exit and the program to terminate.@@@@1@56@@danf@17-8-2009 10441950@unknown@formal@none@1@S@Next, the [[Layout manager|layout]] of the frame is set to a BorderLayout; this tells Swing how to arrange the components that will be added to the frame.@@@@1@27@@danf@17-8-2009 10441960@unknown@formal@none@1@S@A '''{{Javadoc:SE|javax/swing|JLabel}}''' is created for the string '''"Hello, world!"''' and the '''{{Javadoc:SE|name=add(Component)|java/awt|Container|add(java.awt.Component)}}''' method inherited from the {{Javadoc:SE|java/awt|Container}} superclass is called to add the label to the frame.@@@@1@27@@danf@17-8-2009 10441970@unknown@formal@none@1@S@The '''{{Javadoc:SE|name=pack()|java/awt|Window|pack()}}''' method inherited from the {{Javadoc:SE|java/awt|Window}} superclass is called to size the window and lay out its contents, in the manner indicated by the BorderLayout.@@@@1@26@@danf@17-8-2009 10441980@unknown@formal@none@1@S@The '''main()''' method is called by the JVM when the program starts.@@@@1@12@@danf@17-8-2009 10441990@unknown@formal@none@1@S@It [[Instance (programming)|instantiates]] a new '''Hello''' frame and causes it to be displayed by calling the '''{{Javadoc:SE|name=setVisible(boolean)|java/awt|Component|setVisible(boolean)}}''' method inherited from the {{Javadoc:SE|java/awt|Component}} superclass with the boolean parameter '''true'''.@@@@1@28@@danf@17-8-2009 10442000@unknown@formal@none@1@S@Note that once the frame is displayed, exiting the main method does not cause the program to terminate because the AWT [[event dispatching thread]] remains active until all of the Swing top-level windows have been disposed.@@@@1@36@@danf@17-8-2009 10442010@unknown@formal@none@1@S@== Criticism ==@@@@1@3@@danf@17-8-2009 10442020@unknown@formal@none@1@S@[[Java performance|Java's performance]] has improved substantially since the early versions, and performance of [[JIT compiler]]s relative to native compilers has in some tests been shown to be quite similar.@@@@1@29@@danf@17-8-2009 10442030@unknown@formal@none@1@S@The performance of the compilers does not necessarily indicate the performance of the compiled code; only careful testing can reveal the true performance issues in any system.@@@@1@27@@danf@17-8-2009 10442040@unknown@formal@none@1@S@The default [[look and feel]] of [[Graphical User Interface|GUI]] applications written in Java using the [[Swing (Java)|Swing]] toolkit is very different from native applications.@@@@1@24@@danf@17-8-2009 10442050@unknown@formal@none@1@S@It is possible to specify a different look and feel through the [[pluggable look and feel]] system of Swing.@@@@1@19@@danf@17-8-2009 10442060@unknown@formal@none@1@S@Clones of [[Microsoft Windows|Windows]], [[GTK]] and [[Motif (widget toolkit)|Motif]] are supplied by Sun.@@@@1@13@@danf@17-8-2009 10442070@unknown@formal@none@1@S@[[Apple Computer|Apple]] also provides an [[Aqua (theme)|Aqua]] look and feel for [[Mac OS X]].@@@@1@14@@danf@17-8-2009 10442080@unknown@formal@none@1@S@Though prior implementations of these looks and feels have been considered lacking, Swing in Java SE 6 addresses this problem by using more native [[Widget (computing)|widget]] drawing routines of the underlying platforms.@@@@1@32@@danf@17-8-2009 10442090@unknown@formal@none@1@S@Alternatively, third party toolkits such as [[wx4j]], [[Qt (toolkit)|Qt Jambi]] or [[Standard Widget Toolkit|SWT]] may be used for increased integration with the native windowing system.@@@@1@25@@danf@17-8-2009 10442100@unknown@formal@none@1@S@As in C++ and some other object-oriented languages, variables of Java's [[primitive type]]s were not originally objects.@@@@1@17@@danf@17-8-2009 10442110@unknown@formal@none@1@S@Values of primitive types are either stored directly in fields (for objects) or on the [[Stack-based memory allocation|stack]] (for methods) rather than on the heap, as is the common case for objects (but see [[Escape analysis]]).@@@@1@36@@danf@17-8-2009 10442120@unknown@formal@none@1@S@This was a conscious decision by Java's designers for performance reasons.@@@@1@11@@danf@17-8-2009 10442130@unknown@formal@none@1@S@Because of this, Java was not considered to be a pure object-oriented programming language.@@@@1@14@@danf@17-8-2009 10442140@unknown@formal@none@1@S@However, as of Java 5.0, [[Object type|autoboxing]] enables programmers to write as if primitive types are their wrapper classes, with their object-oriented counterparts representing classes of their own, and freely interchange between them for improved flexibility.@@@@1@36@@danf@17-8-2009 10442150@unknown@formal@none@1@S@Java suppresses several features (such as [[operator overloading]] and [[multiple inheritance]]) for ''classes'' in order to simplify the language, to "save the programmers from themselves", and to prevent possible errors and anti-pattern design.@@@@1@33@@danf@17-8-2009 10442160@unknown@formal@none@1@S@This has been a source of criticism, relating to a lack of low-level features, but some of these limitations may be worked around.@@@@1@23@@danf@17-8-2009 10442170@unknown@formal@none@1@S@Java ''interfaces'' have always had multiple inheritance.@@@@1@7@@danf@17-8-2009 10442180@unknown@formal@none@1@S@== Resources ==@@@@1@3@@danf@17-8-2009 10442190@unknown@formal@none@1@S@=== Java Runtime Environment ===@@@@1@5@@danf@17-8-2009 10442200@unknown@formal@none@1@S@The Java Runtime Environment, or ''JRE'', is the software required to run any [[Application software|application]] deployed on the Java Platform.@@@@1@20@@danf@17-8-2009 10442210@unknown@formal@none@1@S@[[End-user]]s commonly use a JRE in [[Software package (programming)|software package]]s and Web browser [[plugin]]s.@@@@1@14@@danf@17-8-2009 10442220@unknown@formal@none@1@S@Sun also distributes a superset of the JRE called the Java 2 [[SDK]] (more commonly known as the JDK), which includes development tools such as the [[Java compiler]], [[Javadoc]], [[JAR (file format)|Jar]] and [[debugger]].@@@@1@34@@danf@17-8-2009 10442230@unknown@formal@none@1@S@One of the unique advantages of the concept of a runtime engine is that errors (exceptions) should not 'crash' the system.@@@@1@21@@danf@17-8-2009 10442240@unknown@formal@none@1@S@Moreover, in runtime engine environments such as Java there exist tools that attach to the runtime engine and every time that an exception of interest occurs they record debugging information that existed in memory at the time the exception was thrown (stack and heap values).@@@@1@45@@danf@17-8-2009 10442250@unknown@formal@none@1@S@These [[Automated Exception Handling]] tools provide 'root-cause' information for exceptions in Java programs that run in production, testing or development environments.@@@@1@21@@danf@17-8-2009 10442260@unknown@formal@none@1@S@==== Components ====@@@@1@3@@danf@17-8-2009 10442270@unknown@formal@none@1@S@* Java [[Library (computer science)|libraries]] are the compiled [[byte code]]s of [[source code]] developed by the JRE implementor to support application development in Java.@@@@1@24@@danf@17-8-2009 10442280@unknown@formal@none@1@S@Examples of these libraries are:@@@@1@5@@danf@17-8-2009 10442290@unknown@formal@none@1@S@** The core libraries, which include:@@@@1@6@@danf@17-8-2009 10442300@unknown@formal@none@1@S@*** Collection libraries that implement [[data structure]]s such as [[List (computing)|lists]], [[associative array|dictionaries]], [[tree structure|trees]] and [[Set (computer science)|sets]]@@@@1@19@@danf@17-8-2009 10442310@unknown@formal@none@1@S@*** [[XML]] Processing (Parsing, Transforming, Validating) libraries@@@@1@7@@danf@17-8-2009 10442320@unknown@formal@none@1@S@*** Security@@@@1@2@@danf@17-8-2009 10442330@unknown@formal@none@1@S@*** [[i18n|Internationalization and localization]] libraries@@@@1@5@@danf@17-8-2009 10442340@unknown@formal@none@1@S@** The integration libraries, which allow the application writer to communicate with external systems.@@@@1@14@@danf@17-8-2009 10442350@unknown@formal@none@1@S@These libraries include:@@@@1@3@@danf@17-8-2009 10442360@unknown@formal@none@1@S@*** The [[Java Database Connectivity]] (JDBC) [[Application Programming Interface|API]] for database access@@@@1@12@@danf@17-8-2009 10442370@unknown@formal@none@1@S@*** [[Java Naming and Directory Interface]] (JNDI) for lookup and discovery@@@@1@11@@danf@17-8-2009 10442380@unknown@formal@none@1@S@*** [[Java remote method invocation|RMI]] and [[CORBA]] for distributed application development@@@@1@11@@danf@17-8-2009 10442390@unknown@formal@none@1@S@** [[User Interface]] libraries, which include:@@@@1@6@@danf@17-8-2009 10442400@unknown@formal@none@1@S@*** The (heavyweight, or [[native mode|native]]) [[Abstract Windowing Toolkit]] (AWT), which provides [[graphical user interface|GUI]] components, the means for laying out those components and the means for handling events from those components@@@@1@32@@danf@17-8-2009 10442410@unknown@formal@none@1@S@*** The (lightweight) [[Swing (Java)|Swing]] libraries, which are built on AWT but provide (non-native) implementations of the AWT widgetry@@@@1@19@@danf@17-8-2009 10442420@unknown@formal@none@1@S@*** APIs for audio capture, processing, and playback@@@@1@8@@danf@17-8-2009 10442430@unknown@formal@none@1@S@* A platform dependent implementation of [[Java virtual machine]] (JVM) that is the means by which the byte codes of the Java libraries and third party applications are executed@@@@1@29@@danf@17-8-2009 10442440@unknown@formal@none@1@S@* Plugins, which enable [[Java applet|applet]]s to be run in [[Web browser]]s@@@@1@12@@danf@17-8-2009 10442450@unknown@formal@none@1@S@* [[Java Web Start]], which allows Java applications to be efficiently distributed to [[end user]]s across the [[Internet]]@@@@1@18@@danf@17-8-2009 10442460@unknown@formal@none@1@S@* Licensing and documentation@@@@1@4@@danf@17-8-2009 10442470@unknown@formal@none@1@S@=== APIs ===@@@@1@3@@danf@17-8-2009 10442480@unknown@formal@none@1@S@Sun has defined three platforms targeting different application environments and segmented many of its [[application programming interface|API]]s so that they belong to one of the platforms.@@@@1@26@@danf@17-8-2009 10442490@unknown@formal@none@1@S@The platforms are:@@@@1@3@@danf@17-8-2009 10442500@unknown@formal@none@1@S@* [[Java Platform, Micro Edition]] (Java ME) — targeting environments with limited resources,@@@@1@13@@danf@17-8-2009 10442510@unknown@formal@none@1@S@* [[Java Platform, Standard Edition]] (Java SE) — targeting workstation environments, and@@@@1@12@@danf@17-8-2009 10442520@unknown@formal@none@1@S@* [[Java Platform, Enterprise Edition]] (Java EE) — targeting large distributed enterprise or Internet environments.@@@@1@15@@danf@17-8-2009 10442530@unknown@formal@none@1@S@The [[Class (computer science)|classes]] in the Java APIs are organized into separate groups called [[Java package|packages]].@@@@1@16@@danf@17-8-2009 10442540@unknown@formal@none@1@S@Each package contains a set of related [[Interface (Java)|interface]]s, classes and [[exception handling|exceptions]].@@@@1@13@@danf@17-8-2009 10442550@unknown@formal@none@1@S@Refer to the separate platforms for a description of the packages available.@@@@1@12@@danf@17-8-2009 10442560@unknown@formal@none@1@S@The set of APIs is controlled by Sun Microsystems in cooperation with others through the [[Java Community Process]] program.@@@@1@19@@danf@17-8-2009 10442570@unknown@formal@none@1@S@Companies or individuals participating in this process can influence the design and development of the APIs.@@@@1@16@@danf@17-8-2009 10442580@unknown@formal@none@1@S@This process has been a subject of controversy.@@@@1@8@@danf@17-8-2009 10450010@unknown@formal@none@1@S@
Language
@@@@1@1@@danf@17-8-2009 10450020@unknown@formal@none@1@S@A '''language''' is a dynamic set of visual, auditory, or tactile [[symbol]]s of [[communication]] and the elements used to manipulate them.@@@@1@21@@danf@17-8-2009 10450030@unknown@formal@none@1@S@''Language'' can also refer to the use of such systems as a general [[phenomenon]].@@@@1@14@@danf@17-8-2009 10450040@unknown@formal@none@1@S@Language is considered to be an exclusively human mode of communication; although other animals make use of quite sophisticated communicative systems, none of these are known to make use of all of the properties that linguists use to define language.@@@@1@40@@danf@17-8-2009 10450050@unknown@formal@none@1@S@== Properties of language ==@@@@1@5@@danf@17-8-2009 10450060@unknown@formal@none@1@S@A set of agreed-upon symbols is only one feature of language; all languages must define the structural relationships between these symbols in a system of [[grammar]].@@@@1@26@@danf@17-8-2009 10450070@unknown@formal@none@1@S@Rules of grammar are what distinguish language from other forms of communication.@@@@1@12@@danf@17-8-2009 10450080@unknown@formal@none@1@S@They allow a finite set of symbols to be manipulated to create a potentially infinite number of grammatical utterances.@@@@1@19@@danf@17-8-2009 10450090@unknown@formal@none@1@S@Another property of language is that its symbols are [[arbitrary]].@@@@1@10@@danf@17-8-2009 10450100@unknown@formal@none@1@S@Any concept or grammatical rule can be mapped onto a symbol.@@@@1@11@@danf@17-8-2009 10450110@unknown@formal@none@1@S@Most languages make use of sound, but the combinations of sounds used do not have any ''inherent'' meaning – they are merely an agreed-upon convention to represent a certain thing by users of that language.@@@@1@35@@danf@17-8-2009 10450120@unknown@formal@none@1@S@For instance, there is nothing about the [[Spanish language|Spanish]] [[word]] ''{{lang|es|nada}}'' itself that forces Spanish speakers to convey the idea of "nothing".@@@@1@22@@danf@17-8-2009 10450130@unknown@formal@none@1@S@Another set of sounds (for example, the English word ''nothing'') could equally be used to represent the same concept, but all Spanish speakers have acquired or learned to correlate this meaning for this particular sound pattern.@@@@1@36@@danf@17-8-2009 10450140@unknown@formal@none@1@S@For [[Slovene language|Slovenian]], [[Croatian language|Croatian]], [[Serbian language|Serbian/Kosovan]] or [[Bosnian language|Bosnian]] speakers on the other hand, ''{{lang|hr|nada}}'' means something else; it means "hope".@@@@1@22@@danf@17-8-2009 10450150@unknown@formal@none@1@S@==The study of language==@@@@1@4@@danf@17-8-2009 10450160@unknown@formal@none@1@S@===Linguistics===@@@@1@1@@danf@17-8-2009 10450170@unknown@formal@none@1@S@[[Linguistics]] is the [[science|scientific]] and [[philosophy|philosophical]] study of language, encompassing a number of sub-fields.@@@@1@14@@danf@17-8-2009 10450180@unknown@formal@none@1@S@At the core of [[theoretical linguistics]] are the study of language structure ([[grammar]]) and the study of meaning ([[semantics]]).@@@@1@19@@danf@17-8-2009 10450190@unknown@formal@none@1@S@The first of these encompasses [[morphology (linguistics)|morphology]] (the formation and composition of [[word]]s), [[syntax]] (the rules that determine how words combine into [[phrase]]s and [[Sentence (linguistics)|sentences]]) and [[phonology]] (the study of sound systems and abstract sound units).@@@@1@37@@danf@17-8-2009 10450200@unknown@formal@none@1@S@[[Phonetics]] is a related branch of linguistics concerned with the actual properties of speech sounds ([[phone]]s), non-speech sounds, and how they are produced and [[speech perception|perceived]].@@@@1@26@@danf@17-8-2009 10450210@unknown@formal@none@1@S@[[Theoretical linguistics]] is mostly concerned with developing models of linguistic knowledge.@@@@1@11@@danf@17-8-2009 10450220@unknown@formal@none@1@S@The fields that are generally considered as the core of theoretical linguistics are [[syntax]], [[phonology]], [[Morphology (linguistics)|morphology]], and [[semantics]].@@@@1@19@@danf@17-8-2009 10450230@unknown@formal@none@1@S@[[Applied linguistics]] attempts to put linguistic theories into practice through areas like [[translation]], [[Stylistics (linguistics)|stylistics]], [[literary criticism]] and [[Literary theory|theory]], [[discourse analysis]], [[speech therapy]], speech pathology and [[Second language acquisition|foreign language teaching]].@@@@1@32@@danf@17-8-2009 10450240@unknown@formal@none@1@S@===History===@@@@1@1@@danf@17-8-2009 10450250@unknown@formal@none@1@S@The historical record of [[linguistics]] begins in [[India]] with [[Pāṇini]], the [[5th century BCE]] grammarian who formulated 3,959 rules of [[Sanskrit language|Sanskrit]] [[morphology (linguistics)|morphology]], known as the ''{{IAST|[[Aṣṭādhyāyī]]}}'' (अष्टाध्यायी) and with [[Tolkāppiyar]], the [[3rd century BCE]] grammarian of the [[Tamil language|Tamil]] work [[Tolkāppiyam]]. grammar is highly systematized and technical.@@@@1@49@@danf@17-8-2009 10450260@unknown@formal@none@1@S@Inherent in its analytic approach are the concepts of the [[phoneme]], the [[morpheme]], and the [[Root (linguistics)|root]]; Western linguists only recognized the phoneme some two millennia later.@@@@1@27@@danf@17-8-2009 10450270@unknown@formal@none@1@S@Tolkāppiyar's work is perhaps the first to describe [[articulatory phonetics]] for a language.@@@@1@13@@danf@17-8-2009 10450280@unknown@formal@none@1@S@Its classification of the alphabet into [[consonant]]s and [[vowel]]s, and elements like nouns, verbs, vowels, and consonants, which he put into classes, were also breakthroughs at the time.@@@@1@28@@danf@17-8-2009 10450290@unknown@formal@none@1@S@In the [[Middle East]], the [[Persian Empire|Persian]] linguist [[Sibawayh]] (سیبویه) made a detailed and professional description of [[Arabic language|Arabic]] in 760 CE in his monumental work, ''Al-kitab fi al-nahw'' (الكتاب في النحو, ''The Book on Grammar''), bringing many [[Linguistics|linguistic]] aspects of language to light.@@@@1@44@@danf@17-8-2009 10450300@unknown@formal@none@1@S@In his book, he distinguished [[phonetics]] from [[phonology]].@@@@1@8@@danf@17-8-2009 10450310@unknown@formal@none@1@S@Later in the West, the success of [[science]], [[mathematics]], and other [[formal system]]s in the 20th century led many to attempt a formalization of the study of language as a "semantic code".@@@@1@32@@danf@17-8-2009 10450320@unknown@formal@none@1@S@This resulted in the [[academic discipline]] of [[linguistics]], the founding of which is attributed to [[Ferdinand de Saussure]].@@@@1@18@@danf@17-8-2009 10450330@unknown@formal@none@1@S@In the 20th century, substantial contributions to the understanding of language came from [[Ferdinand de Saussure]], [[Hjelmslev]], [[Émile Benveniste]] and [[Roman Jakobson]], which are characterized as being highly [[systematic]].@@@@1@29@@danf@17-8-2009 10450340@unknown@formal@none@1@S@== Human languages ==@@@@1@4@@danf@17-8-2009 10450350@unknown@formal@none@1@S@Human languages are usually referred to as natural languages, and the science of studying them falls under the purview of [[linguistics]].@@@@1@21@@danf@17-8-2009 10450360@unknown@formal@none@1@S@A common progression for natural languages is that they are considered to be first spoken, then written, and then an understanding and explanation of their grammar is attempted.@@@@1@28@@danf@17-8-2009 10450370@unknown@formal@none@1@S@Languages live, die, move from place to place, and change with time.@@@@1@12@@danf@17-8-2009 10450380@unknown@formal@none@1@S@Any language that ceases to change or develop is categorized as a [[dead language]].@@@@1@14@@danf@17-8-2009 10450390@unknown@formal@none@1@S@Conversely, any language that is a ''living language,'' that is, it is in a continuous state of change, is known as a [[modern language]].@@@@1@24@@danf@17-8-2009 10450400@unknown@formal@none@1@S@Making a principled distinction between one language and another is usually impossible.@@@@1@12@@danf@17-8-2009 10450410@unknown@formal@none@1@S@For instance, there are a few [[dialect]]s of [[German language|German]] similar to some dialects of [[Dutch language|Dutch]].@@@@1@17@@danf@17-8-2009 10450420@unknown@formal@none@1@S@The transition between languages within the same [[language family]] is sometimes gradual (see [[dialect continuum]]).@@@@1@15@@danf@17-8-2009 10450430@unknown@formal@none@1@S@Some like to make parallels with [[biology]], where it is not possible to make a well-defined distinction between one species and the next.@@@@1@23@@danf@17-8-2009 10450440@unknown@formal@none@1@S@In either case, the ultimate difficulty may stem from the [[interaction]]s between languages and [[population]]s.@@@@1@15@@danf@17-8-2009 10450450@unknown@formal@none@1@S@(See [[Dialect]] or [[August Schleicher]] for a longer discussion.)@@@@1@9@@danf@17-8-2009 10450460@unknown@formal@none@1@S@The concepts of [[Ausbausprache - Abstandsprache - Dachsprache|Ausbausprache, Abstandsprache and Dachsprache]] are used to make finer distinctions about the degrees of difference between languages or dialects.@@@@1@26@@danf@17-8-2009 10450470@unknown@formal@none@1@S@==Artificial languages==@@@@1@2@@danf@17-8-2009 10450480@unknown@formal@none@1@S@=== Constructed languages ===@@@@1@4@@danf@17-8-2009 10450490@unknown@formal@none@1@S@Some individuals and groups have constructed their own artificial languages, for practical, experimental, personal, or ideological reasons.@@@@1@17@@danf@17-8-2009 10450500@unknown@formal@none@1@S@International auxiliary languages are generally constructed languages that strive to be easier to learn than natural languages; other constructed languages strive to be more logical ("loglangs") than natural languages; a prominent example of this is [[Lojban]].@@@@1@36@@danf@17-8-2009 10450510@unknown@formal@none@1@S@Some writers, such as [[J. R. R. Tolkien]], have created fantasy languages, for literary, [[Artistic language|artistic]] or personal reasons.@@@@1@19@@danf@17-8-2009 10450520@unknown@formal@none@1@S@The fantasy language of the [[Klingon]] race has in recent years been developed by fans of the Star Trek series, including a vocabulary and grammar.@@@@1@25@@danf@17-8-2009 10450530@unknown@formal@none@1@S@Constructed languages are not necessarily restricted to the properties shared by natural languages.@@@@1@13@@danf@17-8-2009 10450540@unknown@formal@none@1@S@This part of ISO 639 also includes identifiers that denote constructed (or artificial) languages.@@@@1@14@@danf@17-8-2009 10450550@unknown@formal@none@1@S@In order to qualify for inclusion the language must have a literature and it must be designed for the purpose of human communication.@@@@1@23@@danf@17-8-2009 10450560@unknown@formal@none@1@S@Specifically excluded are reconstructed languages and computer programming languages.@@@@1@9@@danf@17-8-2009 10450570@unknown@formal@none@1@S@===International auxiliary languages===@@@@1@3@@danf@17-8-2009 10450580@unknown@formal@none@1@S@Some languages, most constructed, are meant specifically for communication between people of different nationalities or language groups as an easy-to-learn second language.@@@@1@22@@danf@17-8-2009 10450590@unknown@formal@none@1@S@Several of these languages have been constructed by individuals or groups.@@@@1@11@@danf@17-8-2009 10450600@unknown@formal@none@1@S@Natural, pre-existing languages may also be used in this way - their developers merely catalogued and standardized their vocabulary and identified their grammatical rules.@@@@1@24@@danf@17-8-2009 10450610@unknown@formal@none@1@S@These languages are called ''naturalistic.''@@@@1@5@@danf@17-8-2009 10450620@unknown@formal@none@1@S@One such language, [[Latino Sine Flexione]], is a simplified form of Latin.@@@@1@12@@danf@17-8-2009 10450630@unknown@formal@none@1@S@Two others, [[Occidental language|Occidental]] and [[Novial]], were drawn from several Western languages.@@@@1@12@@danf@17-8-2009 10450640@unknown@formal@none@1@S@To date, the most successful auxiliary language is [[Esperanto]], invented by Polish ophthalmologist [[L. L. Zamenhof|Zamenhof]].@@@@1@16@@danf@17-8-2009 10450650@unknown@formal@none@1@S@It has a relatively large community roughly estimated at about 2 million speakers worldwide, with a large body of literature, songs, and is the only known constructed language to have [[Native Esperanto speakers|native speakers]], such as the Hungarian-born American businessman [[George Soros]].@@@@1@42@@danf@17-8-2009 10450660@unknown@formal@none@1@S@Other auxiliary languages with a relatively large number of speakers and literature are [[Interlingua]] and [[Ido]].@@@@1@16@@danf@17-8-2009 10450670@unknown@formal@none@1@S@===Controlled languages===@@@@1@2@@danf@17-8-2009 10450680@unknown@formal@none@1@S@Controlled natural languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity.@@@@1@25@@danf@17-8-2009 10450690@unknown@formal@none@1@S@The purpose behind the development and implementation of a controlled natural language typically is to aid non-native speakers of a natural language in understanding it, or to ease computer processing of a natural language.@@@@1@34@@danf@17-8-2009 10450700@unknown@formal@none@1@S@An example of a widely used controlled natural language is [[Simplified English]], which was originally developed for [[aerospace]] industry maintenance manuals.@@@@1@21@@danf@17-8-2009 10450710@unknown@formal@none@1@S@== Formal languages ==@@@@1@4@@danf@17-8-2009 10450720@unknown@formal@none@1@S@[[Mathematics]] and [[computer science]] use artificial entities called formal languages (including [[programming language]]s and [[markup language]]s, and some that are more theoretical in nature).@@@@1@24@@danf@17-8-2009 10450730@unknown@formal@none@1@S@These often take the form of [[character string]]s, produced by a combination of [[formal grammar]] and semantics of arbitrary complexity.@@@@1@20@@danf@17-8-2009 10450740@unknown@formal@none@1@S@=== Programming languages ===@@@@1@4@@danf@17-8-2009 10450750@unknown@formal@none@1@S@A programming language is an extreme case of a formal language that can be used to control the behavior of a machine, particularly a computer, to perform specific tasks.@@@@1@29@@danf@17-8-2009 10450760@unknown@formal@none@1@S@Programming languages are defined using syntactic and semantic rules, to determine structure and meaning respectively.@@@@1@15@@danf@17-8-2009 10450770@unknown@formal@none@1@S@Programming languages are used to facilitate communication about the task of organizing and manipulating information, and to express algorithms precisely.@@@@1@20@@danf@17-8-2009 10450780@unknown@formal@none@1@S@Some authors restrict the term "programming language" to those languages that can express all possible algorithms; sometimes the term "computer language" is used for artificial languages that are more limited.@@@@1@30@@danf@17-8-2009 10450790@unknown@formal@none@1@S@== Animal communication ==@@@@1@4@@danf@17-8-2009 10450800@unknown@formal@none@1@S@The term "[[animal language]]s" is often used for non-human languages.@@@@1@10@@danf@17-8-2009 10450810@unknown@formal@none@1@S@Linguists do not consider these to be "language", but describe them as [[animal communication]], because the interaction between animals in such communication is fundamentally different in its underlying principles from human language.@@@@1@32@@danf@17-8-2009 10450820@unknown@formal@none@1@S@Nevertheless, some scholars have tried to disprove this mainstream premise through experiments on training chimpanzees to talk.@@@@1@17@@danf@17-8-2009 10450830@unknown@formal@none@1@S@[[Karl von Frisch]] received the Nobel Prize in 1973 for his proof of the language and dialects of the bees.@@@@1@20@@danf@17-8-2009 10450840@unknown@formal@none@1@S@In several publicized instances, non-human animals have been taught to understand certain features of human language.@@@@1@16@@danf@17-8-2009 10450850@unknown@formal@none@1@S@[[Chimpanzee]]s, [[gorilla]]s, and [[orangutan]]s have been taught hand signs based on [[American Sign Language]].@@@@1@14@@danf@17-8-2009 10450860@unknown@formal@none@1@S@The [[African Grey Parrot]], which possesses the ability to mimic human speech with a high degree of accuracy, is suspected of having sufficient intelligence to comprehend some of the speech it mimics.@@@@1@32@@danf@17-8-2009 10450870@unknown@formal@none@1@S@Most species of [[parrot]], despite expert mimicry, are believed to have no linguistic comprehension at all.@@@@1@16@@danf@17-8-2009 10450880@unknown@formal@none@1@S@While proponents of animal communication systems have debated levels of [[semantics]], these systems have not been found to have anything approaching human language [[syntax]].@@@@1@24@@danf@17-8-2009 10460010@unknown@formal@none@1@S@
Language model
@@@@1@2@@danf@17-8-2009 10460020@unknown@formal@none@1@S@A statistical '''language model''' assigns a [[probability]] to a sequence of ''m'' words P(w_1,\\ldots,w_m) by means of a [[probability distribution]].@@@@1@20@@danf@17-8-2009 10460030@unknown@formal@none@1@S@Language modeling is used in many [[natural language processing]] applications such as [[speech recognition]], [[machine translation]], [[part-of-speech tagging]], [[parsing]] and [[information retrieval]].@@@@1@22@@danf@17-8-2009 10460040@unknown@formal@none@1@S@In [[speech recognition]] and in [[data compression]], such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.@@@@1@28@@danf@17-8-2009 10460050@unknown@formal@none@1@S@When used in information retrieval, a language model is associated with a [[document]] in a collection.@@@@1@16@@danf@17-8-2009 10460060@unknown@formal@none@1@S@With query ''Q'' as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, ''P(Q|Md)''.@@@@1@26@@danf@17-8-2009 10460070@unknown@formal@none@1@S@Estimating the probability of sequences can become difficult in [[corpora]], in which [[phrase]]s or [[Sentence (linguistics)|sentence]]s can be arbitrarily long and hence some sequences are not observed during [[training]] of the language model ([[data sparseness problem]] of [[overfitting]]).@@@@1@38@@danf@17-8-2009 10460080@unknown@formal@none@1@S@For that reason these models are often approximated using smoothed [[N-gram]] models.@@@@1@12@@danf@17-8-2009 10460090@unknown@formal@none@1@S@== N-gram models ==@@@@1@4@@danf@17-8-2009 10460100@unknown@formal@none@1@S@In an n-gram model, the probability P(w_1,\\ldots,w_m) of observing the sentence w1,...,wm is approximated as@@@@1@15@@danf@17-8-2009 10460110@unknown@formal@none@1@S@ P(w_1,\\ldots,w_m) = \\prod^m_{i=1} P(w_i|w_1,\\ldots,w_{i-1}) \\approx \\prod^m_{i=1} P(w_i|w_{i-(n-1)},\\ldots,w_{i-1}) @@@@1@9@@danf@17-8-2009 10460120@unknown@formal@none@1@S@Here, it is assumed that the probability of observing the ''ith'' word ''wi'' in the context history of the preceding ''i-1'' words can be approximated by the probability of observing it in the shortened context history of the preceding ''n-1'' words (''nth order [[Markov property]]).@@@@1@45@@danf@17-8-2009 10460130@unknown@formal@none@1@S@The conditional probability can be calculated from n-gram frequency counts: P(w_i|w_{i-(n-1)},\\ldots,w_{i-1}) = \\frac{count(w_{i-(n-1)},\\ldots,w_{i-1})}{count(w_{i-(n-1)},w_{i-1},\\ldots,w_i)} @@@@1@15@@danf@17-8-2009 10460140@unknown@formal@none@1@S@The words '''bigram''' and '''trigram''' language model denote n-gram language models with ''n=2'' and ''n=3'', respectively.@@@@1@16@@danf@17-8-2009 10460150@unknown@formal@none@1@S@=== Example ===@@@@1@3@@danf@17-8-2009 10460160@unknown@formal@none@1@S@In a bigram (n=2) language model, the probability of the sentence ''I saw the red house'' is approximated as P(I,saw,the,red,house) \\approx P(I) P(saw|I) P(the|saw) P(red|the) P(house|red) @@@@1@28@@danf@17-8-2009 10460170@unknown@formal@none@1@S@whereas in a trigram (n=3) language model, the approximation is P(I,saw,the,red,house) \\approx P(I) P(saw|I) P(the|I,saw) P(red|saw,the) P(house|the,red) @@@@1@19@@danf@17-8-2009 10470010@unknown@formal@none@1@S@
Latent semantic analysis
@@@@1@3@@danf@17-8-2009 10470020@unknown@formal@none@1@S@'''Latent semantic analysis (LSA)''' is a technique in [[natural language processing]], in particular in [[vectorial semantics]], of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.@@@@1@41@@danf@17-8-2009 10470030@unknown@formal@none@1@S@LSA was patented in [[1988]] ([http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=4839853 US Patent 4,839,853]) by [[Scott Deerwester]], [[Susan Dumais]], [[George Furnas]], [[Richard Harshman]], [[Thomas Landauer]], [[Karen Lochbaum]] and [[Lynn Streeter]].@@@@1@25@@danf@17-8-2009 10470040@unknown@formal@none@1@S@In the context of its application to [[information retrieval]], it is sometimes called '''latent semantic indexing (LSI)'''.@@@@1@17@@danf@17-8-2009 10470050@unknown@formal@none@1@S@== Occurrence matrix ==@@@@1@4@@danf@17-8-2009 10470060@unknown@formal@none@1@S@LSA can use a [[term-document matrix]] which describes the occurrences of terms in documents; it is a [[sparse matrix]] whose rows correspond to [[terminology|terms]] and whose columns correspond to documents, typically [[stemming|stemmed]] words that appear in the documents.@@@@1@38@@danf@17-8-2009 10470070@unknown@formal@none@1@S@A typical example of the weighting of the elements of the matrix is [[tf-idf]] (term frequency–inverse document frequency): the element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance.@@@@1@46@@danf@17-8-2009 10470080@unknown@formal@none@1@S@This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used.@@@@1@29@@danf@17-8-2009 10470090@unknown@formal@none@1@S@LSA transforms the occurrence matrix into a relation between the terms and some ''concepts'', and a relation between those concepts and the documents.@@@@1@23@@danf@17-8-2009 10470100@unknown@formal@none@1@S@Thus the terms and documents are now indirectly related through the concepts.@@@@1@12@@danf@17-8-2009 10470110@unknown@formal@none@1@S@== Applications ==@@@@1@3@@danf@17-8-2009 10470120@unknown@formal@none@1@S@The new concept space typically can be used to:@@@@1@9@@danf@17-8-2009 10470130@unknown@formal@none@1@S@* Compare the documents in the concept space ([[data clustering]], [[document classification]])......@@@@1@12@@danf@17-8-2009 10470140@unknown@formal@none@1@S@* Find similar documents across languages, after analyzing a base set of translated documents ([[cross language retrieval]]).@@@@1@17@@danf@17-8-2009 10470150@unknown@formal@none@1@S@* Find relations between terms ([[synonymy]] and [[polysemy]]).@@@@1@8@@danf@17-8-2009 10470160@unknown@formal@none@1@S@* Given a query of terms, translate it into the concept space, and find matching documents ([[information retrieval]]).@@@@1@18@@danf@17-8-2009 10470170@unknown@formal@none@1@S@Synonymy and polysemy are fundamental problems in [[natural language processing]]:@@@@1@10@@danf@17-8-2009 10470180@unknown@formal@none@1@S@* Synonymy is the phenomenon where different words describe the same idea.@@@@1@12@@danf@17-8-2009 10470190@unknown@formal@none@1@S@Thus, a query in a search engine may fail to retrieve a relevant document that does not contain the words which appeared in the query.@@@@1@25@@danf@17-8-2009 10470200@unknown@formal@none@1@S@For example, a search for "doctors" may not return a document containing the word "physicians", even though the words have the same meaning.@@@@1@23@@danf@17-8-2009 10470210@unknown@formal@none@1@S@* Polysemy is the phenomenon where the same word has multiple meanings.@@@@1@12@@danf@17-8-2009 10470220@unknown@formal@none@1@S@So a search may retrieve irrelevant documents containing the desired words in the wrong meaning.@@@@1@15@@danf@17-8-2009 10470230@unknown@formal@none@1@S@For example, a botanist and a computer scientist looking for the word "tree" probably desire different sets of documents.@@@@1@19@@danf@17-8-2009 10470240@unknown@formal@none@1@S@== Rank lowering ==@@@@1@4@@danf@17-8-2009 10470250@unknown@formal@none@1@S@After the construction of the occurrence matrix, LSA finds a low-[[rank (matrix theory)|rank]] approximation to the [[term-document matrix]].@@@@1@18@@danf@17-8-2009 10470260@unknown@formal@none@1@S@There could be various reasons for these approximations:@@@@1@8@@danf@17-8-2009 10470270@unknown@formal@none@1@S@* The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an ''approximation'' (a "least and necessary evil").@@@@1@31@@danf@17-8-2009 10470280@unknown@formal@none@1@S@* The original term-document matrix is presumed ''noisy'': for example, anecdotal instances of terms are to be eliminated.@@@@1@18@@danf@17-8-2009 10470290@unknown@formal@none@1@S@From this point of view, the approximated matrix is interpreted as a ''de-noisified matrix'' (a better matrix than the original).@@@@1@20@@danf@17-8-2009 10470300@unknown@formal@none@1@S@* The original term-document matrix is presumed overly [[Sparse matrix|sparse]] relative to the "true" term-document matrix.@@@@1@16@@danf@17-8-2009 10470310@unknown@formal@none@1@S@That is, the original matrix lists only the words actually ''in'' each document, whereas we might be interested in all words ''related to'' each document--generally a much larger set due to [[synonymy]].@@@@1@32@@danf@17-8-2009 10470320@unknown@formal@none@1@S@The consequence of the rank lowering is that some dimensions are combined and depend on more than one term:@@@@1@19@@danf@17-8-2009 10470330@unknown@formal@none@1@S@:: {(car), (truck), (flower)} --> {(1.3452 * car + 0.2828 * truck), (flower)}@@@@1@13@@danf@17-8-2009 10470340@unknown@formal@none@1@S@This mitigates synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings.@@@@1@20@@danf@17-8-2009 10470350@unknown@formal@none@1@S@It also mitigates polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning.@@@@1@27@@danf@17-8-2009 10470360@unknown@formal@none@1@S@Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense.@@@@1@29@@danf@17-8-2009 10470370@unknown@formal@none@1@S@== Derivation ==@@@@1@3@@danf@17-8-2009 10470380@unknown@formal@none@1@S@Let X be a matrix where element (i,j) describes the occurrence of term i in document j (this can be, for example, the frequency).@@@@1@24@@danf@17-8-2009 10470385@unknown@formal@none@1@S@X will look like this:@@@@1@5@@danf@17-8-2009 10470390@unknown@formal@none@1@S@: \\begin{matrix} & \\textbf{d}_j \\\\ & \\downarrow \\\\ \\textbf{t}_i^T \\rightarrow & \\begin{bmatrix} x_{1,1} & \\dots & x_{1,n} \\\\ \\vdots & \\ddots & \\vdots \\\\ x_{m,1} & \\dots & x_{m,n} \\\\ \\end{bmatrix} \\end{matrix} @@@@1@33@@danf@17-8-2009 10470400@unknown@formal@none@1@S@Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document:@@@@1@20@@danf@17-8-2009 10470410@unknown@formal@none@1@S@:\\textbf{t}_i^T = \\begin{bmatrix} x_{i,1} & \\dots & x_{i,n} \\end{bmatrix}@@@@1@9@@danf@17-8-2009 10470420@unknown@formal@none@1@S@Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term:@@@@1@20@@danf@17-8-2009 10470430@unknown@formal@none@1@S@:\\textbf{d}_j = \\begin{bmatrix} x_{1,j} \\\\ \\vdots \\\\ x_{m,j} \\end{bmatrix}@@@@1@9@@danf@17-8-2009 10470440@unknown@formal@none@1@S@Now the [[dot product]] \\textbf{t}_i^T \\textbf{t}_p between two term vectors gives the [[correlation]] between the terms over the documents.@@@@1@19@@danf@17-8-2009 10470450@unknown@formal@none@1@S@The [[matrix product]] X X^T contains all these dot products.@@@@1@10@@danf@17-8-2009 10470460@unknown@formal@none@1@S@Element (i,p) (which is equal to element (p,i)) contains the dot product \\textbf{t}_i^T \\textbf{t}_p ( = \\textbf{t}_p^T \\textbf{t}_i).@@@@1@18@@danf@17-8-2009 10470470@unknown@formal@none@1@S@Likewise, the matrix X^T X contains the dot products between all the document vectors, giving their correlation over the terms: \\textbf{d}_j^T \\textbf{d}_q = \\textbf{d}_q^T \\textbf{d}_j.@@@@1@25@@danf@17-8-2009 10470480@unknown@formal@none@1@S@Now assume that there exists a decomposition of X such that U and V are [[orthonormal matrix|orthonormal matrices]] and \\Sigma is a [[diagonal matrix]].@@@@1@24@@danf@17-8-2009 10470490@unknown@formal@none@1@S@This is called a [[singular value decomposition]] (SVD):@@@@1@8@@danf@17-8-2009 10470500@unknown@formal@none@1@S@: X = U \\Sigma V^T @@@@1@7@@danf@17-8-2009 10470510@unknown@formal@none@1@S@The matrix products giving us the term and document correlations then become@@@@1@12@@danf@17-8-2009 10470520@unknown@formal@none@1@S@: \\begin{matrix} X X^T &=& (U \\Sigma V^T) (U \\Sigma V^T)^T = (U \\Sigma V^T) (V^{T^T} \\Sigma^T U^T) = U \\Sigma V^T V \\Sigma^T U^T = U \\Sigma \\Sigma^T U^T \\\\ X^T X &=& (U \\Sigma V^T)^T (U \\Sigma V^T) = (V^{T^T} \\Sigma^T U^T) (U \\Sigma V^T) = V \\Sigma U^T U \\Sigma V^T = V \\Sigma^T \\Sigma V^T \\end{matrix} @@@@1@61@@danf@17-8-2009 10470530@unknown@formal@none@1@S@Since \\Sigma \\Sigma^T and \\Sigma^T \\Sigma are diagonal we see that U must contain the [[eigenvector]]s of X X^T, while V must be the eigenvectors of X^T X.@@@@1@28@@danf@17-8-2009 10470540@unknown@formal@none@1@S@Both products have the same non-zero eigenvalues, given by the non-zero entries of \\Sigma \\Sigma^T, or equally, by the non-zero entries of \\Sigma^T\\Sigma.@@@@1@23@@danf@17-8-2009 10470550@unknown@formal@none@1@S@Now the decomposition looks like this:@@@@1@6@@danf@17-8-2009 10470560@unknown@formal@none@1@S@: \\begin{matrix} & X & & & U & & \\Sigma & & V^T \\\\ & (\\textbf{d}_j) & & & & & & & (\\hat \\textbf{d}_j) \\\\ & \\downarrow & & & & & & & \\downarrow \\\\ (\\textbf{t}_i^T) \\rightarrow & \\begin{bmatrix} x_{1,1} & \\dots & x_{1,n} \\\\ \\\\ \\vdots & \\ddots & \\vdots \\\\ \\\\ x_{m,1} & \\dots & x_{m,n} \\\\ \\end{bmatrix} & = & (\\hat \\textbf{t}_i^T) \\rightarrow & \\begin{bmatrix} \\begin{bmatrix} \\, \\\\ \\, \\\\ \\textbf{u}_1 \\\\ \\, \\\\ \\,\\end{bmatrix} \\dots \\begin{bmatrix} \\, \\\\ \\, \\\\ \\textbf{u}_l \\\\ \\, \\\\ \\, \\end{bmatrix} \\end{bmatrix} & \\cdot & \\begin{bmatrix} \\sigma_1 & \\dots & 0 \\\\ \\vdots & \\ddots & \\vdots \\\\ 0 & \\dots & \\sigma_l \\\\ \\end{bmatrix} & \\cdot & \\begin{bmatrix} \\begin{bmatrix} & & \\textbf{v}_1 & & \\end{bmatrix} \\\\ \\vdots \\\\ \\begin{bmatrix} & & \\textbf{v}_l & & \\end{bmatrix} \\end{bmatrix} \\end{matrix} @@@@1@141@@danf@17-8-2009 10470570@unknown@formal@none@1@S@The values \\sigma_1, \\dots, \\sigma_l are called the singular values, and u_1, \\dots, u_l and v_1, \\dots, v_l the left and right singular vectors.@@@@1@24@@danf@17-8-2009 10470580@unknown@formal@none@1@S@Notice how the only part of U that contributes to \\textbf{t}_i is the i\\textrm{'th} row.@@@@1@15@@danf@17-8-2009 10470590@unknown@formal@none@1@S@Let this row vector be called \\hat \\textrm{t}_i.@@@@1@8@@danf@17-8-2009 10470600@unknown@formal@none@1@S@Likewise, the only part of V^T that contributes to \\textbf{d}_j is the j\\textrm{'th} column, \\hat \\textrm{d}_j.@@@@1@16@@danf@17-8-2009 10470610@unknown@formal@none@1@S@These are ''not'' the eigenvectors, but ''depend'' on ''all'' the eigenvectors.@@@@1@11@@danf@17-8-2009 10470620@unknown@formal@none@1@S@It turns out that when you select the k largest singular values, and their corresponding singular vectors from U and V, you get the rank k approximation to X with the smallest error ([[Frobenius norm]]).@@@@1@35@@danf@17-8-2009 10470630@unknown@formal@none@1@S@The amazing thing about this approximation is that not only does it have a minimal error, but it translates the term and document vectors into a concept space.@@@@1@28@@danf@17-8-2009 10470640@unknown@formal@none@1@S@The vector \\hat \\textbf{t}_i then has k entries, each giving the occurrence of term i in one of the k concepts.@@@@1@21@@danf@17-8-2009 10470650@unknown@formal@none@1@S@Likewise, the vector \\hat \\textbf{d}_j gives the relation between document j and each concept.@@@@1@14@@danf@17-8-2009 10470660@unknown@formal@none@1@S@We write this approximation as@@@@1@5@@danf@17-8-2009 10470670@unknown@formal@none@1@S@:X_k = U_k \\Sigma_k V_k^T@@@@1@5@@danf@17-8-2009 10470680@unknown@formal@none@1@S@You can now do the following:@@@@1@6@@danf@17-8-2009 10470690@unknown@formal@none@1@S@* See how related documents j and q are in the concept space by comparing the vectors \\hat \\textbf{d}_j and \\hat \\textbf{d}_q (typically by [[vector space model|cosine similarity]]).@@@@1@28@@danf@17-8-2009 10470700@unknown@formal@none@1@S@This gives you a clustering of the documents.@@@@1@8@@danf@17-8-2009 10470710@unknown@formal@none@1@S@* Comparing terms i and p by comparing the vectors \\hat \\textbf{t}_i and \\hat \\textbf{t}_p, giving you a clustering of the terms in the concept space.@@@@1@26@@danf@17-8-2009 10470720@unknown@formal@none@1@S@* Given a query, view this as a mini document, and compare it to your documents in the concept space.@@@@1@20@@danf@17-8-2009 10470730@unknown@formal@none@1@S@To do the latter, you must first translate your query into the concept space.@@@@1@14@@danf@17-8-2009 10470740@unknown@formal@none@1@S@It is then intuitive that you must use the same transformation that you use on your documents:@@@@1@17@@danf@17-8-2009 10470750@unknown@formal@none@1@S@:\\textbf{d}_j = U_k \\Sigma_k \\hat \\textbf{d}_j@@@@1@6@@danf@17-8-2009 10470760@unknown@formal@none@1@S@:\\hat \\textbf{d}_j = \\Sigma_k^{-1} U_k^T \\textbf{d}_j@@@@1@6@@danf@17-8-2009 10470770@unknown@formal@none@1@S@This means that if you have a query vector q, you must do the translation \\hat \\textbf{q} = \\Sigma_k^{-1} U_k^T \\textbf{q} before you compare it with the document vectors in the concept space.@@@@1@33@@danf@17-8-2009 10470780@unknown@formal@none@1@S@You can do the same for pseudo term vectors:@@@@1@9@@danf@17-8-2009 10470790@unknown@formal@none@1@S@:\\textbf{t}_i^T = \\hat \\textbf{t}_i^T \\Sigma_k V_k^T@@@@1@6@@danf@17-8-2009 10470800@unknown@formal@none@1@S@:\\hat \\textbf{t}_i^T = \\textbf{t}_i^T V_k^{-T} \\Sigma_k^{-1} = \\textbf{t}_i^T V_k \\Sigma_k^{-1}@@@@1@10@@danf@17-8-2009 10470810@unknown@formal@none@1@S@:\\hat \\textbf{t}_i = \\Sigma_k^{-1} V_k^T \\textbf{t}_i@@@@1@6@@danf@17-8-2009 10470820@unknown@formal@none@1@S@== Implementation ==@@@@1@3@@danf@17-8-2009 10470830@unknown@formal@none@1@S@The [[Singular Value Decomposition|SVD]] is typically computed using large matrix methods (for example, [[Lanczos method]]s) but may also be computed incrementally and with greatly reduced resources via a [[neural network]]-like approach, which does not require the large, full-rank matrix to be held in memory ([http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf Gorrell and Webb, 2005]).@@@@1@49@@danf@17-8-2009 10470840@unknown@formal@none@1@S@A fast, incremental, low-memory, large-matrix SVD algorithm has recently been developed ([http://www.merl.com/publications/TR2006-059/ Brand, 2006]).@@@@1@14@@danf@17-8-2009 10470850@unknown@formal@none@1@S@Unlike Gorrell and Webb's (2005) stochastic approximation, Brand's (2006) algorithm provides an exact solution.@@@@1@14@@danf@17-8-2009 10470860@unknown@formal@none@1@S@== Limitations ==@@@@1@3@@danf@17-8-2009 10470870@unknown@formal@none@1@S@LSA has two drawbacks:@@@@1@4@@danf@17-8-2009 10470880@unknown@formal@none@1@S@* The resulting dimensions might be difficult to interpret.@@@@1@9@@danf@17-8-2009 10470890@unknown@formal@none@1@S@For instance, in@@@@1@3@@danf@17-8-2009 10470900@unknown@formal@none@1@S@:: {(car), (truck), (flower)} --> {(1.3452 * car + 0.2828 * truck), (flower)}@@@@1@13@@danf@17-8-2009 10470910@unknown@formal@none@1@S@:the (1.3452 * car + 0.2828 * truck) component could be interpreted as "vehicle".@@@@1@14@@danf@17-8-2009 10470920@unknown@formal@none@1@S@However, it is very likely that cases close to@@@@1@9@@danf@17-8-2009 10470930@unknown@formal@none@1@S@:: {(car), (bottle), (flower)} --> {(1.3452 * car + 0.2828 * bottle), (flower)}@@@@1@13@@danf@17-8-2009 10470940@unknown@formal@none@1@S@:will occur.@@@@1@2@@danf@17-8-2009 10470950@unknown@formal@none@1@S@This leads to results which can be justified on the mathematical level, but have no interpretable meaning in natural language.@@@@1@20@@danf@17-8-2009 10470960@unknown@formal@none@1@S@* The [[probabilistic model]] of LSA does not match observed data: LSA assumes that words and documents form a joint [[normal distribution|Gaussian]] model ([[ergodic hypothesis]]), while a [[Poisson distribution]] has been observed.@@@@1@32@@danf@17-8-2009 10470970@unknown@formal@none@1@S@Thus, a newer alternative is [[probabilistic latent semantic analysis]], based on a [[multinomial distribution|multinomial]] model, which is reported to give better results than standard LSA .@@@@1@26@@danf@17-8-2009 10480010@unknown@formal@none@1@S@
Linguistics
@@@@1@1@@danf@17-8-2009 10480020@unknown@formal@none@1@S@'''Linguistics''' is the [[science|scientific]] study of [[language]], encompassing a number of sub-fields.@@@@1@12@@danf@17-8-2009 10480030@unknown@formal@none@1@S@An important topical division is between the study of language structure ([[grammar]]) and the study of [[Meaning (linguistics)|meaning]] ([[semantics]]).@@@@1@19@@danf@17-8-2009 10480040@unknown@formal@none@1@S@Grammar encompasses [[morphology (linguistics)|morphology]] (the formation and composition of [[word]]s), [[syntax]] (the rules that determine how words combine into [[phrase]]s and [[Sentence (linguistics)|sentences]]) and [[phonology]] (the study of sound systems and abstract sound units).@@@@1@34@@danf@17-8-2009 10480050@unknown@formal@none@1@S@[[Phonetics]] is a related branch of linguistics concerned with the actual properties of speech sounds ([[phone]]s), non-speech sounds, and how they are produced and [[speech perception|perceived]].@@@@1@26@@danf@17-8-2009 10480060@unknown@formal@none@1@S@Over the twentieth century, following the work of [[Noam Chomsky]], linguistics came to be dominated by the [[Generative grammar|Generativist school]], which is chiefly concerned with explaining how human beings [[language acquisition|acquire language]] and the biological constraints on this acquisition; generative theory is [[Language module|modularist]] in character.@@@@1@46@@danf@17-8-2009 10480070@unknown@formal@none@1@S@While this remains the dominant paradigm, other linguistic theories have increasingly gained in popularity — [[cognitive linguistics]] being a prominent example.@@@@1@21@@danf@17-8-2009 10480080@unknown@formal@none@1@S@There are many sub-fields in linguistics, which may or may not be dominated by a particular theoretical approach: [[evolutionary linguistics]], for example, attempts to account for the origins of language; [[historical linguistics]] explores language change; and [[sociolinguistics]] looks at the relation between linguistic variation and social structures.@@@@1@47@@danf@17-8-2009 10480090@unknown@formal@none@1@S@A variety of intellectual disciplines are relevant to the study of language.@@@@1@12@@danf@17-8-2009 10480100@unknown@formal@none@1@S@Although certain linguists have downplayed the relevance of some other fields, linguistics — like other sciences — is highly interdisciplinary and draws on work from such fields as [[psychology]], [[informatics]], [[computer science]], [[philosophy]], [[biology]], [[human anatomy]], [[neuroscience]], [[sociology]], [[anthropology]], and [[acoustics]].@@@@1@41@@danf@17-8-2009 10480110@unknown@formal@none@1@S@==Names for the discipline==@@@@1@4@@danf@17-8-2009 10480120@unknown@formal@none@1@S@Before the twentieth century (the word is first attested 1716), the term "[[philology]]" was commonly used to refer to the science of language, which was then predominately historical in focus.@@@@1@30@@danf@17-8-2009 10480130@unknown@formal@none@1@S@Since [[Ferdinand de Saussure]]'s insistence on the importance of [[Synchronic analysis (linguistics)|synchronic analysis]], however, this focus has shifted and the term "philology" is now generally used for the "study of a language's grammar, history and literary tradition", especially in the [[USA]]., where it was never as popular as elsewhere in the sense "science of language".@@@@1@55@@danf@17-8-2009 10480140@unknown@formal@none@1@S@The term "linguistics" dates from 1847, although "linguist" in the sense a student of language" dates from 1641.@@@@1@18@@danf@17-8-2009 10480150@unknown@formal@none@1@S@It is now the usual academic term in English for the scientific study of language.@@@@1@15@@danf@17-8-2009 10480160@unknown@formal@none@1@S@==Fundamental concerns and divisions==@@@@1@4@@danf@17-8-2009 10480170@unknown@formal@none@1@S@Linguistics concerns itself with describing and explaining the nature of human language.@@@@1@12@@danf@17-8-2009 10480180@unknown@formal@none@1@S@Relevant to this are the questions of what is universal to language, how language can vary, and how human beings come to know languages.@@@@1@24@@danf@17-8-2009 10480190@unknown@formal@none@1@S@All humans (setting aside extremely pathological cases) achieve competence in whatever language is spoken (or signed, in the case of [[sign language|signed languages]]) around them when growing up, with apparently little need for explicit conscious instruction.@@@@1@36@@danf@17-8-2009 10480200@unknown@formal@none@1@S@While non-humans acquire their own communication systems, they do not acquire human language in this way (although many non-human animals can learn to respond to language, or can even be trained to use it to a degree).@@@@1@37@@danf@17-8-2009 10480210@unknown@formal@none@1@S@Therefore, linguists assume, the ability to acquire and use language is an innate, biologically-based potential of modern human beings, similar to the ability to walk.@@@@1@25@@danf@17-8-2009 10480220@unknown@formal@none@1@S@There is no consensus, however, as to the extent of this innate potential, or its domain-specificity (the degree to which such innate abilities are specific to language), with some theorists claiming that there is a very large set of highly abstract and specific binary settings coded into the human brain, while others claim that the ability to learn language is a product of general human cognition.@@@@1@66@@danf@17-8-2009 10480230@unknown@formal@none@1@S@It is, however, generally agreed that there are no strong ''genetic'' differences underlying the differences between languages: an individual will acquire whatever language(s) they are exposed to as a child, regardless of parentage or ethnic origin.@@@@1@36@@danf@17-8-2009 10480240@unknown@formal@none@1@S@Linguistic structures are pairings of meaning and form (which may consist of sound patterns, movements of the hand, written symbols, and so on); such pairings are known as [[Ferdinand de Saussure|Saussurean]] [[linguistic sign|signs]].@@@@1@33@@danf@17-8-2009 10480250@unknown@formal@none@1@S@Linguists may specialize in some sub-area of linguistic structure, which can be arranged in the following terms, from form to meaning:@@@@1@21@@danf@17-8-2009 10480260@unknown@formal@none@1@S@* '''[[Phonetics]]''', the study of the physical properties of speech (or signed) production and perception@@@@1@15@@danf@17-8-2009 10480270@unknown@formal@none@1@S@* '''[[Phonology]]''', the study of sounds (adjusted appropriately for signed languages) as discrete, abstract elements in the speaker's mind that distinguish meaning@@@@1@22@@danf@17-8-2009 10480280@unknown@formal@none@1@S@* '''[[Morphology (linguistics)|Morphology]]''', the study of internal structures of [[word]]s and how they can be modified@@@@1@16@@danf@17-8-2009 10480290@unknown@formal@none@1@S@* '''[[Syntax]]''', the study of how words combine to form grammatical [[sentence]]s@@@@1@12@@danf@17-8-2009 10480300@unknown@formal@none@1@S@* '''[[Semantics]]''', the study of the meaning of words ([[lexical semantics]]) and fixed word combinations ([[phraseology]]), and how these combine to form the [[meaning]]s of sentences@@@@1@26@@danf@17-8-2009 10480310@unknown@formal@none@1@S@* '''[[Pragmatics]]''', the study of how [[utterance]]s are used (literally, figuratively, or otherwise) in [[speech acts|communicative acts]]@@@@1@17@@danf@17-8-2009 10480320@unknown@formal@none@1@S@* '''[[Discourse analysis]]''', the analysis of language use in [[texts]] (spoken, written, or signed)@@@@1@14@@danf@17-8-2009 10480330@unknown@formal@none@1@S@Many linguists would agree that these divisions overlap considerably, and the independent significance of each of these areas is not universally acknowledged.@@@@1@22@@danf@17-8-2009 10480340@unknown@formal@none@1@S@Regardless of any particular linguist's position, each area has core concepts that foster significant scholarly inquiry and research.@@@@1@18@@danf@17-8-2009 10480350@unknown@formal@none@1@S@Intersecting with these domains are fields arranged around the kind of external factors that are considered.@@@@1@16@@danf@17-8-2009 10480360@unknown@formal@none@1@S@For example@@@@1@2@@danf@17-8-2009 10480370@unknown@formal@none@1@S@* [[Linguistic typology]], the study of the common properties of diverse unrelated languages, properties that may, given sufficient attestation, be assumed to be innate to human language capacity.@@@@1@28@@danf@17-8-2009 10480380@unknown@formal@none@1@S@* [[Stylistics (linguistics)|Stylistics]], the study of linguistic factors that place a discourse in context.@@@@1@14@@danf@17-8-2009 10480390@unknown@formal@none@1@S@* [[Developmental linguistics]], the study of the development of linguistic ability in an individual, particularly [[Language acquisition|the acquisition of language]] in childhood.@@@@1@22@@danf@17-8-2009 10480400@unknown@formal@none@1@S@* [[Historical linguistics]] or Diachronic linguistics, the study of language change.@@@@1@11@@danf@17-8-2009 10480410@unknown@formal@none@1@S@* [[Language geography]], the study of the spatial patterns of languages.@@@@1@11@@danf@17-8-2009 10480420@unknown@formal@none@1@S@* [[Evolutionary linguistics]], the study of the origin and subsequent development of language.@@@@1@13@@danf@17-8-2009 10480430@unknown@formal@none@1@S@* [[Psycholinguistics]], the study of the cognitive processes and representations underlying language use.@@@@1@13@@danf@17-8-2009 10480440@unknown@formal@none@1@S@* [[Sociolinguistics]], the study of social patterns and norms of linguistic variability.@@@@1@12@@danf@17-8-2009 10480450@unknown@formal@none@1@S@* [[Clinical linguistics]], the application of linguistic theory to the area of [[Speech-Language Pathology]].@@@@1@14@@danf@17-8-2009 10480460@unknown@formal@none@1@S@* [[Neurolinguistics]], the study of the brain networks that underlie grammar and communication.@@@@1@13@@danf@17-8-2009 10480470@unknown@formal@none@1@S@* [[Biolinguistics]], the study of natural as well as human-taught communication systems in animals compared to human language.@@@@1@18@@danf@17-8-2009 10480480@unknown@formal@none@1@S@* [[Computational linguistics]], the study of computational implementations of linguistic structures.@@@@1@11@@danf@17-8-2009 10480490@unknown@formal@none@1@S@* [[Applied linguistics]], the study of language related issues applied in everyday life, notably language. policies, planning, and education.@@@@1@19@@danf@17-8-2009 10480500@unknown@formal@none@1@S@[[Constructed language]] fits under Applied linguistics.@@@@1@6@@danf@17-8-2009 10480510@unknown@formal@none@1@S@The related discipline of [[semiotics]] investigates the relationship between signs and what they signify.@@@@1@14@@danf@17-8-2009 10480520@unknown@formal@none@1@S@From the perspective of semiotics, language can be seen as a sign or symbol, with the world as its representation.@@@@1@20@@danf@17-8-2009 10480530@unknown@formal@none@1@S@==Variation and universality==@@@@1@3@@danf@17-8-2009 10480540@unknown@formal@none@1@S@Much modern linguistic research, particularly within the [[paradigm]] of [[generative grammar]], has concerned itself with trying to account for differences between languages of the world.@@@@1@25@@danf@17-8-2009 10480550@unknown@formal@none@1@S@This has worked on the assumption that if human linguistic ability is narrowly constrained by human biology, then all languages must share certain fundamental properties.@@@@1@25@@danf@17-8-2009 10480560@unknown@formal@none@1@S@In [[generative grammar|generativist theory]], the collection of fundamental properties all languages share are referred to as [[universal grammar]] (UG).@@@@1@19@@danf@17-8-2009 10480570@unknown@formal@none@1@S@The specific characteristics of this universal grammar are a much debated topic.@@@@1@12@@danf@17-8-2009 10480580@unknown@formal@none@1@S@[[Linguistic typology|Typologists]] and non-generativist linguists usually refer simply to [[linguistic universal|language universals]], or ''universals of language''.@@@@1@16@@danf@17-8-2009 10480590@unknown@formal@none@1@S@Similarities between languages can have a number of different origins.@@@@1@10@@danf@17-8-2009 10480600@unknown@formal@none@1@S@In the simplest case, universal properties may be due to universal aspects of human experience.@@@@1@15@@danf@17-8-2009 10480610@unknown@formal@none@1@S@For example, all humans experience water, and all human languages have a word for water.@@@@1@15@@danf@17-8-2009 10480620@unknown@formal@none@1@S@Other similarities may be due to common descent: the [[Latin language]] spoken by the [[Ancient Rome|Ancient Romans]] developed into Spanish in Spain and Italian in Italy; similarities between Spanish and Italian are thus in many cases due to both being descended from Latin.@@@@1@43@@danf@17-8-2009 10480630@unknown@formal@none@1@S@In other cases, [[Language contact|contact between languages]] — particularly where many speakers are bilingual — can lead to much borrowing of structures, as well as words.@@@@1@26@@danf@17-8-2009 10480640@unknown@formal@none@1@S@Similarity may also, of course, be due to coincidence.@@@@1@9@@danf@17-8-2009 10480650@unknown@formal@none@1@S@English ''much'' and Spanish ''mucho'' are not descended from the same form or borrowed from one language to the other; nor is the similarity due to innate linguistic knowledge (see [[False cognate]]).@@@@1@32@@danf@17-8-2009 10480660@unknown@formal@none@1@S@Arguments in favor of language universals have also come from documented cases of [[sign language]]s (such as [[Al-Sayyid Bedouin Sign Language]]) developing in communities of congenitally deaf people, independently of spoken language.@@@@1@32@@danf@17-8-2009 10480670@unknown@formal@none@1@S@The properties of these sign languages conform generally to many of the properties of spoken languages.@@@@1@16@@danf@17-8-2009 10480680@unknown@formal@none@1@S@Other known and suspected sign language [[language isolate|isolates]] include [[Kata Kolok]], [[Nicaraguan Sign Language]], and [[Providence Island Sign Language]].@@@@1@19@@danf@17-8-2009 10480690@unknown@formal@none@1@S@== Structures ==@@@@1@3@@danf@17-8-2009 10480700@unknown@formal@none@1@S@It has been perceived that languages tend to be organized around [[grammatical categories]] such as noun and verb, [[nominative case|nominative]] and [[accusative case|accusative]], or present and past, though, importantly, not exclusively so.@@@@1@32@@danf@17-8-2009 10480710@unknown@formal@none@1@S@The grammar of a language is organized around such fundamental categories, though many languages express the relationships between words and syntax in other discrete ways (cf. some Bantu languages for noun/verb relations, ergative/absolutive systems for case relations, several Native American languages for tense/aspect relations).@@@@1@44@@danf@17-8-2009 10480720@unknown@formal@none@1@S@In addition to making substantial use of discrete categories, language has the important property that it organizes elements into recursive structures; this allows, for example, a noun phrase to contain another noun phrase (as in “the chimpanzee’s lips”) or a clause to contain a clause (as in “I think that it’s raining”).@@@@1@52@@danf@17-8-2009 10480730@unknown@formal@none@1@S@Though recursion in grammar was implicitly recognized much earlier (for example by [[Otto Jespersen|Jespersen]]), the importance of this aspect of language became more popular after the 1957 publication of [[Noam Chomsky]]’s book “[[Syntactic Structures]]”, - that presented a formal grammar of a fragment of English.@@@@1@45@@danf@17-8-2009 10480740@unknown@formal@none@1@S@Prior to this, the most detailed descriptions of linguistic systems were of phonological or morphological systems.@@@@1@16@@danf@17-8-2009 10480750@unknown@formal@none@1@S@Chomsky used a [[context-free grammar]] augmented with transformations.@@@@1@8@@danf@17-8-2009 10480760@unknown@formal@none@1@S@Since then, following the trend of Chomskyan linguistics, context-free grammars have been written for substantial fragments of various languages (for example [[Generalised phrase structure grammar|GPSG]], for English), but it has been demonstrated that human languages include cross-serial dependencies, which cannot be handled adequately by context-free grammars.@@@@1@46@@danf@17-8-2009 10480770@unknown@formal@none@1@S@==Some selected sub-fields ==@@@@1@4@@danf@17-8-2009 10480780@unknown@formal@none@1@S@'''Diachronic linguistics'''@@@@1@2@@danf@17-8-2009 10480790@unknown@formal@none@1@S@Studying languages at a particular point in time (usually the present) is "synchronic", while diachronic linguistics examines how language changes through time, sometimes over centuries.@@@@1@25@@danf@17-8-2009 10480800@unknown@formal@none@1@S@It enjoys both a rich history and a strong theoretical foundation for the study of [[language change]].@@@@1@17@@danf@17-8-2009 10480810@unknown@formal@none@1@S@In universities in the United States, the non-historic perspective is often out of fashion.@@@@1@14@@danf@17-8-2009 10480820@unknown@formal@none@1@S@The shift in focus to a non-historic perspective started with [[Ferdinand de Saussure|Saussure]] and became pre-dominant with [[Noam Chomsky]].@@@@1@19@@danf@17-8-2009 10480830@unknown@formal@none@1@S@Explicitly historical perspectives include [[historical-comparative linguistics]] and [[etymology]].@@@@1@8@@danf@17-8-2009 10480840@unknown@formal@none@1@S@'''Contextual linguistics'''@@@@1@2@@danf@17-8-2009 10480850@unknown@formal@none@1@S@Contextual linguistics may include the study of linguistics in interaction with other academic disciplines.@@@@1@14@@danf@17-8-2009 10480860@unknown@formal@none@1@S@The interdisciplinary areas of linguistics consider how language interacts with the rest of the world.@@@@1@15@@danf@17-8-2009 10480870@unknown@formal@none@1@S@[[Sociolinguistics]], [[anthropological linguistics]], and [[linguistic anthropology]] are seen as areas that bridge the gap between linguistics and society as a whole.@@@@1@21@@danf@17-8-2009 10480880@unknown@formal@none@1@S@[[Psycholinguistics]] and [[neurolinguistics]] relate linguistics to the [[medical science]]s.@@@@1@9@@danf@17-8-2009 10480890@unknown@formal@none@1@S@Other cross-disciplinary areas of linguistics include [[evolutionary linguistics]], [[computational linguistics]] and [[cognitive science]].@@@@1@13@@danf@17-8-2009 10480900@unknown@formal@none@1@S@'''Applied linguistics'''@@@@1@2@@danf@17-8-2009 10480910@unknown@formal@none@1@S@Linguists are largely concerned with finding and [[descriptive linguistics|describing]] the generalities and varieties both within particular languages and among all language.@@@@1@21@@danf@17-8-2009 10480920@unknown@formal@none@1@S@[[Applied linguistics]] takes the result of those findings and “applies” them to other areas.@@@@1@14@@danf@17-8-2009 10480930@unknown@formal@none@1@S@Often “applied linguistics” refers to the use of linguistic research in language teaching, but results of linguistic research are used in many other areas, as well.@@@@1@26@@danf@17-8-2009 10480940@unknown@formal@none@1@S@Today in the age of information technology, many areas of applied linguistics attempt to involve the use of computers.@@@@1@19@@danf@17-8-2009 10480950@unknown@formal@none@1@S@[[Speech synthesis]] and [[speech recognition]] use phonetic and phonemic knowledge to provide voice interfaces to computers.@@@@1@16@@danf@17-8-2009 10480960@unknown@formal@none@1@S@Applications of [[computational linguistics]] in [[machine translation]], [[computer-assisted translation]], and [[natural language processing]] are areas of applied linguistics which have come to the forefront.@@@@1@24@@danf@17-8-2009 10480970@unknown@formal@none@1@S@Their influence has had an effect on theories of syntax and semantics, as modeling syntactic and semantic theories on computers constraints.@@@@1@21@@danf@17-8-2009 10480980@unknown@formal@none@1@S@==Description and prescription==@@@@1@3@@danf@17-8-2009 10480990@unknown@formal@none@1@S@''Main articles: [[Descriptive linguistics]], [[Linguistic prescription]]''@@@@1@6@@danf@17-8-2009 10481000@unknown@formal@none@1@S@Linguistics is '''descriptive'''; linguists describe and explain features of language without making subjective judgments on whether a particular feature is "right" or "wrong".@@@@1@23@@danf@17-8-2009 10481010@unknown@formal@none@1@S@This is analogous to practice in other sciences: a [[zoologist]] studies the animal kingdom without making subjective judgments on whether a particular animal is better or worse than another.@@@@1@29@@danf@17-8-2009 10481020@unknown@formal@none@1@S@'''Prescription''', on the other hand, is an attempt to promote particular linguistic usages over others, often favouring a particular dialect or "[[acrolect]]".@@@@1@22@@danf@17-8-2009 10481030@unknown@formal@none@1@S@This may have the aim of establishing a [[Standard language|linguistic standard]], which can aid communication over large geographical areas.@@@@1@19@@danf@17-8-2009 10481040@unknown@formal@none@1@S@It may also, however, be an attempt by speakers of one language or dialect to exert influence over speakers of other languages or dialects (see [[Linguistic imperialism]]).@@@@1@27@@danf@17-8-2009 10481050@unknown@formal@none@1@S@An extreme version of prescriptivism can be found among [[censorship|censors]], who attempt to eradicate words and structures which they consider to be destructive to society.@@@@1@25@@danf@17-8-2009 10481060@unknown@formal@none@1@S@== Speech and writing ==@@@@1@5@@danf@17-8-2009 10481070@unknown@formal@none@1@S@Most contemporary linguists work under the assumption that [[spoken language|spoken]] (or signed) language is more fundamental than [[written language]].@@@@1@19@@danf@17-8-2009 10481080@unknown@formal@none@1@S@This is because:@@@@1@3@@danf@17-8-2009 10481090@unknown@formal@none@1@S@* Speech appears to be a human "universal", whereas there have been many [[culture]]s and speech communities that lack written communication;@@@@1@21@@danf@17-8-2009 10481100@unknown@formal@none@1@S@* Speech evolved before human beings discovered writing;@@@@1@8@@danf@17-8-2009 10481110@unknown@formal@none@1@S@* People learn to speak and process spoken languages more easily and much earlier than writing;@@@@1@16@@danf@17-8-2009 10481120@unknown@formal@none@1@S@Linguists nonetheless agree that the study of written language can be worthwhile and valuable.@@@@1@14@@danf@17-8-2009 10481130@unknown@formal@none@1@S@For research that relies on [[corpus linguistics]] and [[computational linguistics]], written language is often much more convenient for processing large amounts of linguistic data.@@@@1@24@@danf@17-8-2009 10481140@unknown@formal@none@1@S@Large corpora of spoken language are difficult to create and hard to find, and are typically [[transcription (linguistics)|transcribed]] and written.@@@@1@20@@danf@17-8-2009 10481150@unknown@formal@none@1@S@Additionally, linguists have turned to text-based discourse occurring in various formats of [[computer-mediated communication]] as a viable site for linguistic inquiry.@@@@1@21@@danf@17-8-2009 10481160@unknown@formal@none@1@S@The study of [[writing systems]] themselves is in any case considered a branch of linguistics.@@@@1@15@@danf@17-8-2009 10481170@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10481180@unknown@formal@none@1@S@Some of the earliest linguistic activities can be recalled from [[Iron Age India]] with the analysis of [[Sanskrit]].@@@@1@18@@danf@17-8-2009 10481190@unknown@formal@none@1@S@The [[Pratishakhya]]s (from ca. the 8th century BC) constitute as it were a proto-linguistic ''ad hoc'' collection of observations about mutations to a given [[corpus linguistics|corpus]] particular to a given [[Shakha|Vedic school]].@@@@1@32@@danf@17-8-2009 10481200@unknown@formal@none@1@S@Systematic study of these texts gives rise to the [[Vedanga]] discipline of [[Vyakarana]], the earliest surviving account of which is the work of {{IAST|[[Pānini]]}} (c. 520 – 460 BC), who, however, looks back on what are probably several generations of grammarians, whose opinions he occasionally refers to.@@@@1@47@@danf@17-8-2009 10481210@unknown@formal@none@1@S@{{IAST|Pānini}} formulates close to 4,000 rules which together form a compact [[generative grammar]] of Sanskrit.@@@@1@15@@danf@17-8-2009 10481220@unknown@formal@none@1@S@Inherent in his analytic approach are the concepts of the [[phoneme]], the [[morpheme]] and the [[root]].@@@@1@16@@danf@17-8-2009 10481230@unknown@formal@none@1@S@Due to its focus on brevity, his grammar has a highly unintuitive structure, reminiscent of contemporary "machine language" (as opposed to "human readable" programming languages).@@@@1@25@@danf@17-8-2009 10481240@unknown@formal@none@1@S@Indian linguistics maintained a high level for several centuries; [[Mahābhāṣya|Patanjali]] in the 2nd century BC still actively criticizes Panini.@@@@1@19@@danf@17-8-2009 10481250@unknown@formal@none@1@S@In the later centuries BC, however, Panini's grammar came to be seen as prescriptive, and commentators came to be fully dependent on it.@@@@1@23@@danf@17-8-2009 10481260@unknown@formal@none@1@S@[[Bhartrihari]] (c. 450 – 510) theorized the act of speech as being made up of four stages: first, conceptualization of an idea, second, its verbalization and sequencing (articulation) and third, delivery of speech into atmospheric air, the interpretation of speech by the listener, the interpreter.@@@@1@45@@danf@17-8-2009 10481270@unknown@formal@none@1@S@In the [[Middle East]], the [[Persian language|Persian]] linguist [[Sibawayh]] made a detailed and professional description of [[Arabic language|Arabic]] in 760, in his monumental work, ''Al-kitab fi al-nahw'' (الكتاب في النحو, ''The Book on Grammar''), bringing many linguistic aspects of language to light.@@@@1@42@@danf@17-8-2009 10481280@unknown@formal@none@1@S@In his book he distinguished [[phonetics]] from [[phonology]].@@@@1@8@@danf@17-8-2009 10481290@unknown@formal@none@1@S@Western linguistics begins in Classical Antiquity with grammatical speculation such as [[Plato]]'s ''[[Cratylus]]''.@@@@1@13@@danf@17-8-2009 10481300@unknown@formal@none@1@S@[[William Jones (philologist)|Sir William Jones]] noted that [[Sanskrit]] shared many common features with classical [[Latin]] and [[Ancient Greek|Greek]], notably verb roots and grammatical structures, such as the [[case system]].@@@@1@29@@danf@17-8-2009 10481310@unknown@formal@none@1@S@This led to the theory that all languages sprung from a common source and to the discovery of the [[Indo-European]] [[language family]].@@@@1@22@@danf@17-8-2009 10481320@unknown@formal@none@1@S@He began the study of [[comparative linguistics]], which would uncover more language families and branches.@@@@1@15@@danf@17-8-2009 10481330@unknown@formal@none@1@S@Some early-19th-century linguists were [[Jakob Grimm]], who devised a principle of consonantal shifts in pronunciation – known as [[Grimm's Law]] – in 1822; [[Karl Verner]], who formulated [[Verner's Law]]; [[August Schleicher]], who created the "Stammbaumtheorie" ("family tree"); and [[Johannes Schmidt (linguist)|Johannes Schmidt]], who developed the "Wellentheorie" ("wave model") in 1872.@@@@1@50@@danf@17-8-2009 10481340@unknown@formal@none@1@S@[[Ferdinand de Saussure]] was the founder of modern structural linguistics.@@@@1@10@@danf@17-8-2009 10481350@unknown@formal@none@1@S@[[Edward Sapir]], a leader in American structural linguistics, was one of the first who explored the relations between language studies and anthropology.@@@@1@22@@danf@17-8-2009 10481360@unknown@formal@none@1@S@His methodology had strong influence on all his successors.@@@@1@9@@danf@17-8-2009 10481370@unknown@formal@none@1@S@[[Noam Chomsky|Noam Chomsky's]] formal model of language, [[transformational-generative grammar]], developed under the influence of his teacher [[Zellig Harris]], who was in turn strongly influenced by [[Leonard Bloomfield]], has been the dominant model since the 1960s.@@@@1@35@@danf@17-8-2009 10481380@unknown@formal@none@1@S@[[Noam Chomsky]] remains a pop-linguistic figure.@@@@1@6@@danf@17-8-2009 10481390@unknown@formal@none@1@S@Linguists (working in frameworks such as [[Head-Driven Phrase Structure Grammar]] (HPSG) or [[Lexical Functional Grammar]] (LFG)) are increasingly seen to stress the importance of formalization and formal rigor in linguistic description, and may distance themselves somewhat from Chomsky's more recent work (the "Minimalist" program for [[Transformational grammar]]), connecting more closely to his earlier works.@@@@1@54@@danf@17-8-2009 10481400@unknown@formal@none@1@S@Other linguists working in [[Optimality Theory]] state generalizations in terms of violable constraints that interact with each other, and abandon the traditional rule-based formalism first pioneered by early work in generativist linguistics.@@@@1@32@@danf@17-8-2009 10481410@unknown@formal@none@1@S@Functionalist linguists working in [[functional grammar]] and [[Cognitive Linguistics]] tend to stress the non-autonomy of linguistic knowledge and the non-universality of linguistic structures, thus differing significantly from the Chomskyan school.@@@@1@30@@danf@17-8-2009 10481420@unknown@formal@none@1@S@They reject Chomskyan intuitive introspection as a scientific method, relying instead on typological evidence.@@@@1@14@@danf@17-8-2009 10490010@unknown@formal@none@1@S@
Linux
@@@@1@1@@danf@17-8-2009 10490020@unknown@formal@none@1@S@'''Linux''' (commonly pronounced {{IPAEng|ˈlɪnəks}} in English; variants exist) is a [[Unix-like]] computer [[operating system]].@@@@1@14@@danf@17-8-2009 10490030@unknown@formal@none@1@S@Linux is one of the most prominent examples of [[free software]] and [[open source]] development: typically all underlying [[source code]] can be freely modified, used, and redistributed by anyone.@@@@1@29@@danf@17-8-2009 10490040@unknown@formal@none@1@S@The name "Linux" comes from the [[Linux kernel]], originally written in 1991 by [[Linus Torvalds]].@@@@1@15@@danf@17-8-2009 10490050@unknown@formal@none@1@S@The system's [[system utility|utilities]] and [[library (computer science)|libraries]] usually come from the [[GNU operating system]], announced in 1983 by [[Richard Stallman]].@@@@1@21@@danf@17-8-2009 10490060@unknown@formal@none@1@S@The GNU contribution is the basis for the alternative name '''GNU/Linux'''.@@@@1@11@@danf@17-8-2009 10490070@unknown@formal@none@1@S@Predominantly known for its use in [[server (computing)|server]]s, Linux is supported by corporations such as [[Dell]], [[Hewlett-Packard]], [[IBM]], [[Novell]], [[Oracle Corporation]], [[Red Hat]], and [[Sun Microsystems]].@@@@1@26@@danf@17-8-2009 10490080@unknown@formal@none@1@S@It is used as an operating system for a wide variety of computer [[hardware]], including [[desktop computer]]s, [[supercomputers]], video game systems, such as the [[PlayStation 2]] and [[PlayStation 3]], several [[arcade games]], and [[embedded devices]] such as [[mobile phone]]s, [[routers]], and [[stage lighting]] systems.@@@@1@44@@danf@17-8-2009 10490090@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10490100@unknown@formal@none@1@S@The [[Unix]] operating system was conceived and implemented in the 1960s and first released in 1970.@@@@1@16@@danf@17-8-2009 10490110@unknown@formal@none@1@S@Its wide availability and [[Porting|portability]] meant that it was widely adopted, copied and modified by academic institutions and businesses, with its design being influential on authors of other systems.@@@@1@29@@danf@17-8-2009 10490120@unknown@formal@none@1@S@The [[GNU Project]], started in 1984, had the goal of creating a "''complete Unix-compatible software system''" made entirely of [[free software]].@@@@1@21@@danf@17-8-2009 10490130@unknown@formal@none@1@S@In 1985, [[Richard Stallman]] created the [[Free Software Foundation]] and developed the [[GNU General Public License]] (GNU GPL).@@@@1@18@@danf@17-8-2009 10490140@unknown@formal@none@1@S@Many of the programs required in an OS (such as libraries, [[compiler]]s, [[text editor]]s, a [[Unix shell]], and a windowing system) were completed by the early 1990s, although low level elements such as [[device driver]]s, [[daemon (computer software)|daemon]]s, and the [[Kernel (computer science)|kernel]] were stalled and incomplete.@@@@1@47@@danf@17-8-2009 10490150@unknown@formal@none@1@S@Linus Torvalds has said that if the GNU kernel had been available at the time (1991), he would not have decided to write his own.@@@@1@25@@danf@17-8-2009 10490160@unknown@formal@none@1@S@=== MINIX ===@@@@1@3@@danf@17-8-2009 10490170@unknown@formal@none@1@S@[[MINIX]], a Unix-like system intended for academic use, was released by [[Andrew S. Tanenbaum]] in 1987.@@@@1@16@@danf@17-8-2009 10490180@unknown@formal@none@1@S@While source code for the system was available, modification and redistribution were restricted (that is not the case today).@@@@1@19@@danf@17-8-2009 10490190@unknown@formal@none@1@S@In addition, MINIX's [[16-bit]] design was not well adapted to the [[32-bit]] design of the increasingly cheap and popular [[Intel 386]] architecture for personal computers.@@@@1@25@@danf@17-8-2009 10490200@unknown@formal@none@1@S@In 1991, Torvalds began to work on a non-commercial replacement for MINIX while he was attending the [[University of Helsinki]].@@@@1@20@@danf@17-8-2009 10490210@unknown@formal@none@1@S@This eventually became the [[Linux kernel]].@@@@1@6@@danf@17-8-2009 10490220@unknown@formal@none@1@S@In 1992, Tanenbaum posted an article on [[Usenet]] claiming Linux was obsolete.@@@@1@12@@danf@17-8-2009 10490230@unknown@formal@none@1@S@In the article, he criticized the operating system as being [[Monolithic kernel|monolithic]] in design and being tied closely to the x86 architecture and thus not portable, as he described "a fundamental error."@@@@1@32@@danf@17-8-2009 10490240@unknown@formal@none@1@S@Tanenbaum suggested that those who wanted a modern operating system should look into one based on the [[microkernel]] model.@@@@1@19@@danf@17-8-2009 10490250@unknown@formal@none@1@S@The posting elicited the response of Torvalds and [[Ken Thompson]], one of the founders of [[Unix]], which resulted in a well known debate over the microkernel and monolithic kernel designs.@@@@1@30@@danf@17-8-2009 10490260@unknown@formal@none@1@S@Linux was dependent on the MINIX [[user space]] at first.@@@@1@10@@danf@17-8-2009 10490270@unknown@formal@none@1@S@With code from the GNU system freely available, it was advantageous if this could be used with the fledgling OS.@@@@1@20@@danf@17-8-2009 10490275@unknown@formal@none@1@S@Code licensed under the GNU GPL can be used in other projects, so long as they also are released under the same or a compatible license.@@@@1@26@@danf@17-8-2009 10490280@unknown@formal@none@1@S@In order to make the Linux kernel compatible with the components from the GNU Project, Torvalds initiated a switch from his original license (which prohibited commercial redistribution) to the GNU GPL.@@@@1@31@@danf@17-8-2009 10490290@unknown@formal@none@1@S@Linux and GNU developers worked to integrate GNU components with Linux to make a fully functional and free operating system.@@@@1@20@@danf@17-8-2009 10490300@unknown@formal@none@1@S@=== Commercial and popular uptake ===@@@@1@6@@danf@17-8-2009 10490310@unknown@formal@none@1@S@Today Linux is used in numerous domains, from [[embedded system]]s to [[supercomputer]]s, and has secured a place in [[server (computing)|server]] installations with the popular [[LAMP (software bundle)|LAMP]] application stack.@@@@1@29@@danf@17-8-2009 10490320@unknown@formal@none@1@S@Torvalds continues to direct the development of the kernel.@@@@1@9@@danf@17-8-2009 10490330@unknown@formal@none@1@S@Stallman heads the Free Software Foundation, which in turn supports the GNU components.@@@@1@13@@danf@17-8-2009 10490340@unknown@formal@none@1@S@Finally, individuals and corporations develop third-party non-GNU components.@@@@1@8@@danf@17-8-2009 10490350@unknown@formal@none@1@S@These third-party components comprise a vast body of work and may include both kernel modules and user applications and libraries.@@@@1@20@@danf@17-8-2009 10490360@unknown@formal@none@1@S@Linux vendors and communities combine and distribute the kernel, GNU components, and non-GNU components, with additional package management software in the form of [[Linux distribution]]s.@@@@1@25@@danf@17-8-2009 10490370@unknown@formal@none@1@S@== Design ==@@@@1@3@@danf@17-8-2009 10490380@unknown@formal@none@1@S@Linux is a modular [[Unix-like]] operating system.@@@@1@7@@danf@17-8-2009 10490390@unknown@formal@none@1@S@It derives much of its basic design from principles established in Unix during the 1970s and 1980s.@@@@1@17@@danf@17-8-2009 10490400@unknown@formal@none@1@S@Linux uses a [[monolithic kernel]], the [[Linux kernel]], which handles process control, networking, and [[peripheral]] and [[file system]] access.@@@@1@19@@danf@17-8-2009 10490410@unknown@formal@none@1@S@[[Device drivers]] are integrated directly with the kernel.@@@@1@8@@danf@17-8-2009 10490420@unknown@formal@none@1@S@Much of Linux's higher-level functionality is provided by separate projects which interface with the kernel.@@@@1@15@@danf@17-8-2009 10490430@unknown@formal@none@1@S@The GNU [[Userland (computing)|userland]] is an important part of most Linux systems, providing the [[shell (computing)|shell]] and [[Unix tool]]s which carry out many basic operating system tasks.@@@@1@27@@danf@17-8-2009 10490440@unknown@formal@none@1@S@On top these tools form a Linux system with a [[graphical user interface]] that can be used, usually running in the [[X Window System]].@@@@1@24@@danf@17-8-2009 10490450@unknown@formal@none@1@S@=== User interface ===@@@@1@4@@danf@17-8-2009 10490460@unknown@formal@none@1@S@Linux can be controlled by one or more of a text-based [[command line interface]] (CLI), [[graphical user interface]] (GUI) (usually the default for desktop), or through controls on the device itself (common on embedded machines).@@@@1@35@@danf@17-8-2009 10490470@unknown@formal@none@1@S@On desktop machines, [[KDE]], [[GNOME]] and [[Xfce]] are the most popular user interfaces, though a variety of other user interfaces exist.@@@@1@21@@danf@17-8-2009 10490480@unknown@formal@none@1@S@Most popular user interfaces run on top of the [[X Window System]] (X), which provides [[network transparency]], enabling a graphical application running on one machine to be displayed and controlled from another.@@@@1@32@@danf@17-8-2009 10490490@unknown@formal@none@1@S@Other GUIs include [[X window manager]]s such as [[FVWM]], [[Enlightenment (window manager)|Enlightenment]] and [[Window Maker]].@@@@1@15@@danf@17-8-2009 10490500@unknown@formal@none@1@S@The window manager provides a means to control the placement and appearance of individual application windows, and interacts with the X window system.@@@@1@23@@danf@17-8-2009 10490510@unknown@formal@none@1@S@A Linux system usually provides a [[CLI]] of some sort through a [[Shell (computing)|shell]], which is the traditional way of interacting with a Unix system.@@@@1@25@@danf@17-8-2009 10490520@unknown@formal@none@1@S@A Linux distribution specialized for servers may use the CLI as its only interface.@@@@1@14@@danf@17-8-2009 10490530@unknown@formal@none@1@S@A “headless system” run without even a monitor can be controlled by the command line via a protocol such as [[Secure Shell|SSH]] or [[telnet]].@@@@1@24@@danf@17-8-2009 10490540@unknown@formal@none@1@S@Most low-level Linux components, including the GNU [[Userland (computing)|Userland]], use the CLI exclusively.@@@@1@13@@danf@17-8-2009 10490550@unknown@formal@none@1@S@The CLI is particularly suited for automation of repetitive or delayed tasks, and provides very simple [[inter-process communication]].@@@@1@18@@danf@17-8-2009 10490560@unknown@formal@none@1@S@A graphical [[terminal emulator]] program is often used to access the CLI from a Linux desktop.@@@@1@16@@danf@17-8-2009 10490570@unknown@formal@none@1@S@== Development ==@@@@1@3@@danf@17-8-2009 10490580@unknown@formal@none@1@S@The primary difference between Linux and many other popular contemporary operating systems is that the [[Linux kernel]] and other components are [[free software|free]] and [[open source software]].@@@@1@27@@danf@17-8-2009 10490590@unknown@formal@none@1@S@Linux is not the only such operating system, although it is the best-known and most widely used.@@@@1@17@@danf@17-8-2009 10490600@unknown@formal@none@1@S@Some [[free software license|free]] and [[open source license|open source]] software licences are based on the principle of [[copyleft]], a kind of reciprocity: any work derived from a copyleft piece of software must also be copyleft itself.@@@@1@36@@danf@17-8-2009 10490610@unknown@formal@none@1@S@The most common free software license, the [[GNU GPL]], is a form of copyleft, and is used for the Linux kernel and many of the components from the [[GNU project]].@@@@1@30@@danf@17-8-2009 10490620@unknown@formal@none@1@S@As an operating system [[underdog (competition)|underdog]] competing with mainstream operating systems, Linux cannot rely on a [[monopoly]] advantage; in order for Linux to be convenient for users, Linux aims for [[interoperability]] with other operating systems and established computing standards.@@@@1@39@@danf@17-8-2009 10490630@unknown@formal@none@1@S@Linux systems adhere to [[POSIX]], [[Single UNIX Specification|SUS]], [[International Organization for Standardization|ISO]] and [[American National Standards Institute|ANSI]] standards where possible, although to date only one Linux distribution has been POSIX.1 certified, Linux-FT.@@@@1@32@@danf@17-8-2009 10490640@unknown@formal@none@1@S@Free software projects, although developed in a [[Collaboration|collaborative]] fashion, are often produced independently of each other.@@@@1@16@@danf@17-8-2009 10490650@unknown@formal@none@1@S@However, given that the software licenses explicitly permit redistribution, this provides a basis for larger scale projects that collect the software produced by stand-alone projects and make it available all at once in the form of a [[Linux distribution]].@@@@1@39@@danf@17-8-2009 10490660@unknown@formal@none@1@S@A [[Linux distribution]], commonly called a “distro”, is a project that manages a remote collection of Linux-based software, and facilitates installation of a Linux operating system.@@@@1@26@@danf@17-8-2009 10490670@unknown@formal@none@1@S@Distributions are maintained by individuals, loose-knit teams, volunteer organizations, and commercial entities.@@@@1@12@@danf@17-8-2009 10490680@unknown@formal@none@1@S@They include system software and [[application software]] in the form of ''packages'', and distribution-specific software for initial system installation and configuration as well as later package upgrades and installs.@@@@1@29@@danf@17-8-2009 10490690@unknown@formal@none@1@S@A distribution is responsible for the default configuration of installed Linux systems, system security, and more generally integration of the different software packages into a coherent whole.@@@@1@27@@danf@17-8-2009 10490700@unknown@formal@none@1@S@=== Community ===@@@@1@3@@danf@17-8-2009 10490710@unknown@formal@none@1@S@Linux is largely driven by its developer and user communities.@@@@1@10@@danf@17-8-2009 10490720@unknown@formal@none@1@S@Some vendors develop and fund their distributions on a volunteer basis, [[Debian]] being a well-known example.@@@@1@16@@danf@17-8-2009 10490730@unknown@formal@none@1@S@Others maintain a community version of their commercial distributions, as [[Red Hat]] does with [[Fedora (Linux distribution)|Fedora]].@@@@1@17@@danf@17-8-2009 10490740@unknown@formal@none@1@S@In many cities and regions, local associations known as [[Linux Users Group]]s (LUGs) seek to promote Linux and by extension free software.@@@@1@22@@danf@17-8-2009 10490750@unknown@formal@none@1@S@They hold meetings and provide free demonstrations, training, technical support, and operating system installation to new users.@@@@1@17@@danf@17-8-2009 10490760@unknown@formal@none@1@S@There are also many [[Internet]] communities that seek to provide support to Linux users and developers.@@@@1@16@@danf@17-8-2009 10490770@unknown@formal@none@1@S@Most distributions and open source projects have [[IRC]] chatrooms or [[newsgroup]]s.@@@@1@11@@danf@17-8-2009 10490780@unknown@formal@none@1@S@[[Online forum]]s are another means for support, with notable examples being [[LinuxQuestions.org]] and the [[Gentoo Linux|Gentoo]] forums.@@@@1@17@@danf@17-8-2009 10490790@unknown@formal@none@1@S@Linux distributions host [[mailing list]]s; commonly there will be a specific topic such as usage or development for a given list.@@@@1@21@@danf@17-8-2009 10490800@unknown@formal@none@1@S@There are several technology websites with a Linux focus.@@@@1@9@@danf@17-8-2009 10490810@unknown@formal@none@1@S@[[Linux Weekly News]] is a weekly digest of Linux-related news; the [[Linux Journal]] is an online magazine of Linux articles published monthly; [[Slashdot]] is a technology-related news website with many stories on Linux and open source software; [[Groklaw]] has written in depth about Linux-related legal proceedings and there are many articles relevant to the Linux kernel and its relationship with [[GNU]] on the [[GNU Project|GNU project's]] website.@@@@1@67@@danf@17-8-2009 10490820@unknown@formal@none@1@S@Print [[magazine]]s on Linux often include [[cover disk]]s including software or even complete Linux distributions.@@@@1@15@@danf@17-8-2009 10490830@unknown@formal@none@1@S@Although Linux is generally available free of charge, several large corporations have established business models that involve selling, supporting, and contributing to Linux and free software.@@@@1@26@@danf@17-8-2009 10490840@unknown@formal@none@1@S@These include [[Dell]], [[IBM]], [[Hewlett-Packard|HP]], [[Sun Microsystems]], [[Novell]], and [[Red Hat]].@@@@1@11@@danf@17-8-2009 10490850@unknown@formal@none@1@S@The free software licenses on which Linux is based explicitly accommodate and encourage commercialization; the relationship between Linux as a whole and individual vendors may be seen as [[symbiosis|symbiotic]].@@@@1@29@@danf@17-8-2009 10490860@unknown@formal@none@1@S@One common business model of commercial suppliers is charging for support, especially for business users.@@@@1@15@@danf@17-8-2009 10490870@unknown@formal@none@1@S@A number of companies also offer a specialized business version of their distribution, which adds proprietary support packages and tools to administer higher numbers of installations or to simplify administrative tasks.@@@@1@31@@danf@17-8-2009 10490880@unknown@formal@none@1@S@Another business model is to give away the software in order to sell hardware.@@@@1@14@@danf@17-8-2009 10490890@unknown@formal@none@1@S@=== Programming on Linux ===@@@@1@5@@danf@17-8-2009 10490900@unknown@formal@none@1@S@Most Linux distributions support dozens of [[programming language]]s.@@@@1@8@@danf@17-8-2009 10490910@unknown@formal@none@1@S@The most common collection of utilities for building both Linux applications and operating system programs is found within the [[GNU toolchain]], which includes the [[GNU Compiler Collection]] (GCC) and the [[GNU build system]].@@@@1@33@@danf@17-8-2009 10490920@unknown@formal@none@1@S@Amongst others, GCC provides compilers for [[Ada (programming language)|Ada]], [[C (programming language)|C]], [[C++]], [[Java (programming language)|Java]], and [[Fortran]].@@@@1@18@@danf@17-8-2009 10490930@unknown@formal@none@1@S@The Linux kernel itself is written to be compiled with GCC.@@@@1@11@@danf@17-8-2009 10490940@unknown@formal@none@1@S@[[Proprietary software|Proprietary]] compilers for Linux include the [[Intel C++ Compiler]] and IBM XL C/C++ Compiler.@@@@1@15@@danf@17-8-2009 10490950@unknown@formal@none@1@S@Most distributions also include support for [[Perl]], [[Ruby programming language|Ruby]], [[Python programming language|Python]] and other [[Dynamic programming language|dynamic languages]].@@@@1@19@@danf@17-8-2009 10490960@unknown@formal@none@1@S@Examples of languages that are less common, but still well-supported, are [[C Sharp (programming language)|C#]] via the [[Mono (software)|Mono]] project, sponsored by [[Novell]], and [[Scheme programming language|Scheme]].@@@@1@27@@danf@17-8-2009 10490970@unknown@formal@none@1@S@A number of [[Java Virtual Machine]]s and development kits run on Linux, including the original Sun Microsystems JVM ([[HotSpot]]), and IBM's J2SE RE, as well as many open-source projects like [[Kaffe]].@@@@1@31@@danf@17-8-2009 10490980@unknown@formal@none@1@S@The two main frameworks for developing graphical applications are those of [[GNOME]] and [[KDE]].@@@@1@14@@danf@17-8-2009 10490990@unknown@formal@none@1@S@These projects are based on the [[GTK+]] and [[Qt (toolkit)|Qt]] [[widget toolkit]]s, respectively, which can also be used independently of the larger framework.@@@@1@23@@danf@17-8-2009 10491000@unknown@formal@none@1@S@Both support a wide variety of languages.@@@@1@7@@danf@17-8-2009 10491010@unknown@formal@none@1@S@There are a number of [[Integrated development environment]]s available including [[Anjuta]], [[Code::Blocks]], [[Eclipse (computing)|Eclipse]], [[KDevelop]], [[Lazarus (software)|Lazarus]], [[MonoDevelop]], [[NetBeans]], and [[Omnis Studio]] while the long-established editors [[Vim (text editor)|Vim]] and [[Emacs]] remain popular.@@@@1@33@@danf@17-8-2009 10491020@unknown@formal@none@1@S@== Uses ==@@@@1@3@@danf@17-8-2009 10491030@unknown@formal@none@1@S@As well as those designed for general purpose use on desktops and servers, distributions may be specialized for different purposes including: [[computer architecture]] support, [[Embedded Linux|embedded systems]], stability, security, localization to a specific region or language, targeting of specific user groups, support for [[real-time computing|real-time]] applications, or commitment to a given desktop environment.@@@@1@53@@danf@17-8-2009 10491040@unknown@formal@none@1@S@Furthermore, some distributions deliberately include only [[free software]].@@@@1@8@@danf@17-8-2009 10491050@unknown@formal@none@1@S@Currently, over three hundred distributions are actively developed, with about a dozen distributions being most popular for general-purpose use.@@@@1@19@@danf@17-8-2009 10491060@unknown@formal@none@1@S@Linux is a widely [[porting|ported]] operating system.@@@@1@7@@danf@17-8-2009 10491070@unknown@formal@none@1@S@While the Linux kernel was originally designed only for [[Intel 80386]] [[microprocessor]]s, it now runs on a more diverse range of [[computer architecture]]s than any other operating system: in the hand-held [[ARM architecture|ARM]]-based [[iPAQ]] and the [[mainframe computer|mainframe]] [[IBM]] [[System z9]], in devices ranging from [[mobile phone]]s to [[supercomputer]]s.@@@@1@49@@danf@17-8-2009 10491080@unknown@formal@none@1@S@Specialized distributions exist for less mainstream architectures.@@@@1@7@@danf@17-8-2009 10491090@unknown@formal@none@1@S@The [[ELKS]] kernel [[fork (software development)|fork]] can run on [[Intel 8086]] or [[Intel 80286]] [[16-bit]] microprocessors, while the [[µClinux]] kernel fork may run on systems without a [[memory management unit]].@@@@1@30@@danf@17-8-2009 10491100@unknown@formal@none@1@S@The kernel also runs on architectures that were only ever intended to use a manufacturer-created operating system, such as [[Macintosh]] computers, [[Personal digital assistant|PDA]]s, [[video game console]]s, [[Digital audio player|portable music players]], and [[mobile phone]]s.@@@@1@35@@danf@17-8-2009 10491110@unknown@formal@none@1@S@=== Desktop ===@@@@1@3@@danf@17-8-2009 10491120@unknown@formal@none@1@S@Although there is a lack of Linux ports for some [[Mac OS X]] and [[Microsoft Windows]] programs in domains such as [[desktop publishing]] and [[professional audio]], applications equivalent to those available for Mac and Windows are available for Linux.@@@@1@39@@danf@17-8-2009 10491130@unknown@formal@none@1@S@Most Linux distributions provide a program for browsing a list of thousands of [[free software]] applications that have already been tested and configured for a specific distribution.@@@@1@27@@danf@17-8-2009 10491140@unknown@formal@none@1@S@These free programs can be downloaded and installed with one mouse click and a digital signature guarantees that no one has added a virus or a spyware to these programs.@@@@1@30@@danf@17-8-2009 10491150@unknown@formal@none@1@S@Many [[free software]] titles that are popular on Windows, such as [[Pidgin (software)|Pidgin]], [[Mozilla Firefox]], [[Openoffice.org]], and [[GIMP]], are available for Linux.@@@@1@22@@danf@17-8-2009 10491160@unknown@formal@none@1@S@A growing amount of proprietary desktop software is also supported under Linux, examples being [[Adobe Flash Player]], [[Adobe Acrobat|Acrobat Reader]], [[Matlab]], [[Nero Burning ROM]], [[Opera (Internet suite)|Opera]], [[RealPlayer]], and [[Skype]].@@@@1@30@@danf@17-8-2009 10491170@unknown@formal@none@1@S@In the field of animation and visual effects, most high end software, such as AutoDesk Maya, Softimage XSI and Apple Shake, is available for Linux, Windows and/or Mac OS X.@@@@1@30@@danf@17-8-2009 10491180@unknown@formal@none@1@S@[[CrossOver]] is a proprietary solution based on the open source [[Wine (software)|Wine]] project that supports running older Windows versions of [[Microsoft Office]] and [[Adobe Photoshop]] versions through CS2.@@@@1@28@@danf@17-8-2009 10491190@unknown@formal@none@1@S@[[Microsoft Office 2007]] and Adobe Photoshop CS3 are known not to work.@@@@1@12@@danf@17-8-2009 10491200@unknown@formal@none@1@S@Besides the free Windows compatibility layer [[Wine (software)|Wine]], most distributions offer [[Dual boot]] and [[X86 virtualization]] for running both Linux and Windows on the same computer.@@@@1@26@@danf@17-8-2009 10491210@unknown@formal@none@1@S@Linux's open nature allows distributed teams to [[L10n|localize]] Linux distributions for use in locales where localizing proprietary systems would not be cost-effective.@@@@1@22@@danf@17-8-2009 10491220@unknown@formal@none@1@S@For example the [[Sinhalese language]] version of the [[Knoppix]] distribution was available for a long time before [[Microsoft Windows XP]] was translated to Sinhalese.@@@@1@24@@danf@17-8-2009 10491230@unknown@formal@none@1@S@In this case the Lanka Linux User Group played a major part in developing the localized system by combining the knowledge of university professors, [[linguist]]s, and local developers.@@@@1@28@@danf@17-8-2009 10491240@unknown@formal@none@1@S@The performance of Linux on the desktop has been a controversial topic, with at least one key Linux kernel developer, Con Kolivas, accusing the Linux community of favouring performance on servers.@@@@1@31@@danf@17-8-2009 10491250@unknown@formal@none@1@S@He quit Linux development because he was frustrated with this lack of focus on the desktop, and then gave a 'tell all' interview on the topic.@@@@1@26@@danf@17-8-2009 10491260@unknown@formal@none@1@S@=== Servers and supercomputers ===@@@@1@5@@danf@17-8-2009 10491270@unknown@formal@none@1@S@Historically, Linux has mainly been used as a [[Server (computing)|server]] operating system, and has risen to prominence in that area; [[Netcraft]] reported in September 2006 that eight of the ten most reliable internet hosting companies run Linux on their [[web server]]s.@@@@1@41@@danf@17-8-2009 10491280@unknown@formal@none@1@S@This is due to its relative stability and long uptime, and the fact that desktop software with a graphical user interface for servers is often unneeded.@@@@1@26@@danf@17-8-2009 10491290@unknown@formal@none@1@S@Enterprise and non-enterprise Linux distributions may be found running on servers.@@@@1@11@@danf@17-8-2009 10491300@unknown@formal@none@1@S@Linux is the cornerstone of the [[LAMP (software bundle)|LAMP]] server-software combination (Linux, [[Apache HTTP Server|Apache]], [[MySQL]], [[Perl]]/[[PHP]]/[[Python (programming language)|Python]]) which has achieved popularity among developers, and which is one of the more common platforms for website hosting.@@@@1@37@@danf@17-8-2009 10491310@unknown@formal@none@1@S@Linux is commonly used as an operating system for [[supercomputer]]s.@@@@1@10@@danf@17-8-2009 10491320@unknown@formal@none@1@S@As of [[November 2007]], out of the top 500 systems, 426 (85.2%) run Linux.@@@@1@14@@danf@17-8-2009 10491330@unknown@formal@none@1@S@=== Embedded devices ===@@@@1@4@@danf@17-8-2009 10491340@unknown@formal@none@1@S@Due to its low cost and ability to be easily modified, an [[embedded Linux]] is often used in [[embedded systems]].@@@@1@20@@danf@17-8-2009 10491350@unknown@formal@none@1@S@Linux has become a major competitor to the proprietary [[Symbian OS]] found in the majority of smartphones — 16.7% of [[smartphone]]s sold worldwide during 2006 were using Linux — and it is an alternative to the proprietary [[Windows CE]] and [[Palm OS]] operating systems on [[mobile device]]s.@@@@1@47@@danf@17-8-2009 10491360@unknown@formal@none@1@S@Cell phones or PDAs running on Linux and built on open source platform became a trend from 2007, like [[Nokia N810]], [[Openmoko]]'s [[Neo1973]] and the on-going [[Google Android]].@@@@1@28@@danf@17-8-2009 10491370@unknown@formal@none@1@S@The popular [[TiVo]] digital video recorder uses a customized version of Linux.@@@@1@12@@danf@17-8-2009 10491380@unknown@formal@none@1@S@Several network [[firewall]] and [[router]] standalone products, including several from [[Linksys]], use Linux internally, using its advanced firewall and routing capabilities.@@@@1@21@@danf@17-8-2009 10491390@unknown@formal@none@1@S@The [[Korg OASYS]] and the [[Yamaha Motif|Yamaha Motif XS]] [[music workstation]]s also run Linux.@@@@1@14@@danf@17-8-2009 10491400@unknown@formal@none@1@S@Further more Linux is used in the leading [[stage lighting]] control system, FlyingPig/HighEnd WholeHogIII Console .@@@@1@16@@danf@17-8-2009 10491410@unknown@formal@none@1@S@=== Market share and uptake ===@@@@1@6@@danf@17-8-2009 10491420@unknown@formal@none@1@S@Many quantitative studies of open source software focus on topics including market share and reliability, with numerous studies specifically examining Linux.@@@@1@21@@danf@17-8-2009 10491430@unknown@formal@none@1@S@The Linux market is growing rapidly, and the revenue of servers, desktops, and packaged software running Linux is expected to exceed $35.7 billion by 2008.@@@@1@25@@danf@17-8-2009 10491440@unknown@formal@none@1@S@[[International Data Corporation|IDC]]'s report for Q1 2007 says that Linux now holds 12.7% of the overall server market.@@@@1@18@@danf@17-8-2009 10491450@unknown@formal@none@1@S@This estimate was based on the number of Linux servers sold by various companies.@@@@1@14@@danf@17-8-2009 10491460@unknown@formal@none@1@S@Desktop adoption of Linux is approximately 1%.@@@@1@7@@danf@17-8-2009 10491470@unknown@formal@none@1@S@In comparison, [[List of Microsoft operating systems|Microsoft operating systems]] hold more than 90%.@@@@1@13@@danf@17-8-2009 10491480@unknown@formal@none@1@S@The frictional cost of switching operating systems and lack of support for certain hardware and application programs designed for [[Microsoft Windows]] have been two factors that have inhibited adoption.@@@@1@29@@danf@17-8-2009 10491490@unknown@formal@none@1@S@Proponents and analysts attribute the relative success of Linux to its security, reliability, low cost, and freedom from [[vendor lock-in]].@@@@1@20@@danf@17-8-2009 10491500@unknown@formal@none@1@S@Also most recently Google has begun to fund [[Wine (software)|Wine]], which acts as a compatibility layer, allowing users to run some Windows programs under Linux.@@@@1@25@@danf@17-8-2009 10491510@unknown@formal@none@1@S@The [[OLPC XO-1|XO laptop]] project of One Laptop Per Child is creating a new and potentially much larger Linux community, planned to reach [http://www.laptop.org/en/vision/mission/index.shtml several hundred million schoolchildren] and their families and communities in developing countries.@@@@1@36@@danf@17-8-2009 10491515@unknown@formal@none@1@S@[http://wiki.laptop.org/go/countries Six countries] have ordered a million or more units each for delivery in 2007 to distribute to schoolchildren at no charge.@@@@1@22@@danf@17-8-2009 10491520@unknown@formal@none@1@S@[[Google]], [[Red Hat]], and [[eBay]] are major supporters of the project.@@@@1@11@@danf@17-8-2009 10491530@unknown@formal@none@1@S@== Copyright and naming ==@@@@1@5@@danf@17-8-2009 10491540@unknown@formal@none@1@S@The Linux kernel and most GNU software are [[software license|license]]d under the [[GNU General Public License]] (GPL).@@@@1@17@@danf@17-8-2009 10491550@unknown@formal@none@1@S@The GPL requires that anyone who distributes the Linux kernel must make the source code (and any modifications) available to the recipient under the same terms.@@@@1@26@@danf@17-8-2009 10491560@unknown@formal@none@1@S@In 1997, Linus Torvalds stated, “Making Linux GPL'd was definitely the best thing I ever did.”@@@@1@16@@danf@17-8-2009 10491570@unknown@formal@none@1@S@Other key components of a Linux system may use other licenses; many libraries use the [[GNU Lesser General Public License]] (LGPL), a more permissive variant of the GPL, and the [[X Window System]] uses the [[MIT License]].@@@@1@37@@danf@17-8-2009 10491580@unknown@formal@none@1@S@Torvalds has publicly stated that he would not move the Linux kernel (currently licensed under GPL version 2) to version 3 of the GPL, released in mid-2007, specifically citing some provisions in the new license which prohibit the use of the software in [[digital rights management]].@@@@1@46@@danf@17-8-2009 10491590@unknown@formal@none@1@S@A 2001 study of [[Red Hat Linux]] 7.1 found that this distribution contained 30 million [[source lines of code]].@@@@1@19@@danf@17-8-2009 10491600@unknown@formal@none@1@S@Using the [[COCOMO|Constructive Cost Model]], the study estimated that this distribution required about eight thousand man-years of development time.@@@@1@19@@danf@17-8-2009 10491610@unknown@formal@none@1@S@According to the study, if all this software had been developed by conventional [[proprietary software|proprietary]] means, it would have cost about 1.08 billion dollars (year 2000 U.S. dollars) to develop in the United States.@@@@1@34@@danf@17-8-2009 10491620@unknown@formal@none@1@S@Most of the code (71%) was written in the [[C (programming language)|C]] [[computer programming|programming]] [[programming language|language]], but many other languages were used, including [[C++]], [[assembly language]], [[Perl]], [[Python (programming language)|Python]], [[Fortran]], and various [[shell script]]ing languages.@@@@1@36@@danf@17-8-2009 10491630@unknown@formal@none@1@S@Slightly over half of all lines of code were licensed under the GPL.@@@@1@13@@danf@17-8-2009 10491640@unknown@formal@none@1@S@The Linux kernel itself was 2.4 million lines of code, or 8% of the total.@@@@1@15@@danf@17-8-2009 10491650@unknown@formal@none@1@S@In a later study, the same analysis was performed for Debian GNU/Linux version 4.0.@@@@1@14@@danf@17-8-2009 10491660@unknown@formal@none@1@S@This distribution contained over 283 million source lines of code, and the study estimated that it would have cost 5.4 billion Euros to develop by conventional means.@@@@1@27@@danf@17-8-2009 10491670@unknown@formal@none@1@S@In the United States, the name ''Linux'' is a [[trademark]] registered to Linus Torvalds.@@@@1@14@@danf@17-8-2009 10491680@unknown@formal@none@1@S@Initially, nobody registered it, but on [[August 15]] [[1994]], William R. Della Croce, Jr. filed for the trademark ''Linux'', and then demanded royalties from Linux distributors.@@@@1@26@@danf@17-8-2009 10491690@unknown@formal@none@1@S@In 1996, Torvalds and some affected organizations sued him to have the trademark assigned to Torvalds, and in 1997 the case was settled.@@@@1@23@@danf@17-8-2009 10491700@unknown@formal@none@1@S@The licensing of the trademark has since been handled by the [[Linux Mark Institute]].@@@@1@14@@danf@17-8-2009 10491710@unknown@formal@none@1@S@Torvalds has stated that he only trademarked the name to prevent someone else from using it, but was bound in 2005 by [[United States trademark law]] to take active measures to enforce the trademark.@@@@1@34@@danf@17-8-2009 10491720@unknown@formal@none@1@S@As a result, the LMI sent out a number of letters to distribution vendors requesting that a fee be paid for the use of the name, and a number of companies have complied.@@@@1@33@@danf@17-8-2009 10491730@unknown@formal@none@1@S@=== GNU/Linux ===@@@@1@3@@danf@17-8-2009 10491740@unknown@formal@none@1@S@The [[Free Software Foundation]] views Linux distributions which use GNU software as [[GNU variants]] and they ask that such operating systems be referred to as ''GNU/Linux'' or ''a Linux-based GNU system''.@@@@1@31@@danf@17-8-2009 10491750@unknown@formal@none@1@S@However, the media and population at large refers to this family of operating systems simply as ''Linux''.@@@@1@17@@danf@17-8-2009 10491760@unknown@formal@none@1@S@While some distributors make a point of using the aggregate form, most notably [[Debian]] with the ''[[Debian GNU/Linux]]'' distribution, the term's use outside of the enthusiast community is limited.@@@@1@29@@danf@17-8-2009 10491770@unknown@formal@none@1@S@The distinction between the Linux kernel and distributions based on it plus the GNU system is a source of confusion to many newcomers, and the naming remains controversial, as many large Linux distributions (e.g. [[Ubuntu]] and [[SuSE]] Linux) are simply using the ''Linux'' name, rather than ''GNU/Linux''.@@@@1@47@@danf@17-8-2009 10500010@unknown@formal@none@1@S@
List of chatterbots
@@@@1@3@@danf@17-8-2009 10500020@unknown@formal@none@1@S@==Chatterbot Directories==@@@@1@2@@danf@17-8-2009 10500030@unknown@formal@none@1@S@*@@@@1@1@@danf@17-8-2009 10500040@unknown@formal@none@1@S@*[http://www.simonlaven.com Chatterbot Central] at [http://www.simonlaven.com The Simon Laven Page]@@@@1@9@@danf@17-8-2009 10500050@unknown@formal@none@1@S@*[http://www.aidreams.co.uk/chatterbotcollection/index.htm The Chatterbot Collection]@@@@1@4@@danf@17-8-2009 10500060@unknown@formal@none@1@S@*[http://www.aihub.org AI Hub] - A directory of news, programs, and links all related to chatterbots and Artificial Intelligence@@@@1@18@@danf@17-8-2009 10500070@unknown@formal@none@1@S@*[http://www.chatterboxchallenge.com/bots_dir.php The Chatterbox Challenge Bots Directory] at [http://www.chatterboxchallenge.com The Chatterbox Challenge]@@@@1@11@@danf@17-8-2009 10500080@unknown@formal@none@1@S@==Classic Chatterbots==@@@@1@2@@danf@17-8-2009 10500090@unknown@formal@none@1@S@*[[Dr. Sbaitso]]@@@@1@2@@danf@17-8-2009 10500100@unknown@formal@none@1@S@*[[ELIZA]]@@@@1@1@@danf@17-8-2009 10500110@unknown@formal@none@1@S@*[[PARRY]]@@@@1@1@@danf@17-8-2009 10500120@unknown@formal@none@1@S@*[[Racter]]@@@@1@1@@danf@17-8-2009 10500130@unknown@formal@none@1@S@==General Chatterbots==@@@@1@2@@danf@17-8-2009 10500140@unknown@formal@none@1@S@*[[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]] and other [[Alicebot]]/pandorabot-based ([http://www.titane.ca/concordia/dfar251/igod/main.html iGod], [http://www.mousebreaker.com/games/chatbot/play.php Mitsuku], [http://www.friendbot.co.uk FriendBot], etc.)@@@@1@15@@danf@17-8-2009 10500150@unknown@formal@none@1@S@*[[Albert One]]@@@@1@2@@danf@17-8-2009 10500160@unknown@formal@none@1@S@*[[ALIMbot]]@@@@1@1@@danf@17-8-2009 10500170@unknown@formal@none@1@S@*[[CHAT and TIPS]]@@@@1@3@@danf@17-8-2009 10500180@unknown@formal@none@1@S@*[http://www.chat-bot.com Chat-bot]@@@@1@2@@danf@17-8-2009 10500190@unknown@formal@none@1@S@*[[Claude Chatterbot|Claude]]@@@@1@2@@danf@17-8-2009 10500200@unknown@formal@none@1@S@*[http://www.dadorac.com Dadorac]@@@@1@2@@danf@17-8-2009 10500210@unknown@formal@none@1@S@*[http://www.dai2.co.uk/ DAI2] - A dynamic artificial intelligence which learns from its surrounding community@@@@1@13@@danf@17-8-2009 10500220@unknown@formal@none@1@S@*[http://www.elbot.com/ Elbot]@@@@1@2@@danf@17-8-2009 10500230@unknown@formal@none@1@S@*[[Ella Chatterbot|Ella]]@@@@1@2@@danf@17-8-2009 10500240@unknown@formal@none@1@S@*[[Fred Chatterbot|Fred]]@@@@1@2@@danf@17-8-2009 10500250@unknown@formal@none@1@S@*[[Jabberwacky]]@@@@1@1@@danf@17-8-2009 10500260@unknown@formal@none@1@S@*[http://www.abenteuermedien.de/jabberwock Jabberwock]@@@@1@2@@danf@17-8-2009 10500270@unknown@formal@none@1@S@*[http://www.jeeney.com/ Jeeney AI]@@@@1@3@@danf@17-8-2009 10500280@unknown@formal@none@1@S@*[http://www.jixperts.com?lang=en JIxperts] – collection of wiki chatterbots.@@@@1@7@@danf@17-8-2009 10500290@unknown@formal@none@1@S@*[http://www.iaindustrie.fr.nf KAR Intelligent Computer]@@@@1@4@@danf@17-8-2009 10500300@unknown@formal@none@1@S@*[http://www.leeds-city-guide.com/kyle Kyle] – A unique learning Artificial Intelligence chatbot, which employs contextual learning algorithms.@@@@1@14@@danf@17-8-2009 10500310@unknown@formal@none@1@S@*[[MegaHal]]@@@@1@1@@danf@17-8-2009 10500320@unknown@formal@none@1@S@*[[Mr Know-It-All]]@@@@1@2@@danf@17-8-2009 10500330@unknown@formal@none@1@S@*Oliverbot@@@@1@1@@danf@17-8-2009 10500340@unknown@formal@none@1@S@*[http://uk.geocities.com/mattbrown1101/ Poseidon]@@@@1@2@@danf@17-8-2009 10500350@unknown@formal@none@1@S@*[http://www.infradrive.com/robomatic.php RoboMatic X1] - A chatbot which controls the user's PC through chatting by their voice or by typing.@@@@1@19@@danf@17-8-2009 10500360@unknown@formal@none@1@S@*[http://www.cooldictionary.com/splotchy.mpl Splotchy]@@@@1@2@@danf@17-8-2009 10500370@unknown@formal@none@1@S@*[[Starship Titanic#Spookitalk|Spookitalk]] - A chatterbot used for [[Non-player character|NPC]]s in [[Douglas Adams]]' ''Starship Titanic'' video game.@@@@1@16@@danf@17-8-2009 10500380@unknown@formal@none@1@S@*[http://www.onebigspace.com/ Thomas]@@@@1@2@@danf@17-8-2009 10500390@unknown@formal@none@1@S@*[[Ultra Hal Assistant]]@@@@1@3@@danf@17-8-2009 10500400@unknown@formal@none@1@S@*[[Verbot]]@@@@1@1@@danf@17-8-2009 10500410@unknown@formal@none@1@S@*[http://www.yhaken.com/ Yhaken]@@@@1@2@@danf@17-8-2009 10500420@unknown@formal@none@1@S@*[http://www.scientiobot.com ScientioBot] - A new technology chatterbot using concept mining techniques accessible via a free web service.@@@@1@17@@danf@17-8-2009 10500430@unknown@formal@none@1@S@*[http://nicole.jetaylor.net NICOLE] A simple chatterbot with the ability to learn new phrases.@@@@1@12@@danf@17-8-2009 10500440@unknown@formal@none@1@S@==[[Instant messenger|IM]] Chatterbots==@@@@1@3@@danf@17-8-2009 10500450@unknown@formal@none@1@S@*DAI2 is also available on the MSN / Windows Live network as dai2\sdai2.co.uk@@@@1@13@@danf@17-8-2009 10500460@unknown@formal@none@1@S@*[http://www.dnreg.org/bot/ MSN Quickbot]@@@@1@3@@danf@17-8-2009 10500470@unknown@formal@none@1@S@*[http://www.smarterchild.com SmarterChild]@@@@1@2@@danf@17-8-2009 10500480@unknown@formal@none@1@S@*[http://www.spleak.com Spleak]@@@@1@2@@danf@17-8-2009 10500490@unknown@formal@none@1@S@*[http://www.mrmovie.com MrMovie] - searching actors/movies/dvd's in IM (Skype, AOL/AIM or MSN/Live)@@@@1@11@@danf@17-8-2009 10500500@unknown@formal@none@1@S@*[[Inside Messenger Bot|InsideMessenger]]@@@@1@3@@danf@17-8-2009 10500510@unknown@formal@none@1@S@*[http://www.inocu.jt-online.co.uk Inocu] - (MSN/Live)@@@@1@4@@danf@17-8-2009 10500520@unknown@formal@none@1@S@*[http://www.friendbot.co.uk FriendBot-An AIM Chatterbot]@@@@1@4@@danf@17-8-2009 10500530@unknown@formal@none@1@S@*[http://www.amsn-project.net/plugins.php amsnEliza plugin for aMSN]@@@@1@5@@danf@17-8-2009 10500540@unknown@formal@none@1@S@*[[Inside Messenger Bot|TrixieMouse]]@@@@1@3@@danf@17-8-2009 10500550@unknown@formal@none@1@S@*[http://www.infobot.pl/ Infobot] - Polish informational bot for Gadu-gadu, Skype and Jabber@@@@1@11@@danf@17-8-2009 10500560@unknown@formal@none@1@S@==AIML Chatterbots==@@@@1@2@@danf@17-8-2009 10500570@unknown@formal@none@1@S@*[http://www.taik.fi/turingenigma Alan] - In ''Turing Enigma'' Alan Turing's spirit has infiltrated the World War II encrypting device Enigma.@@@@1@18@@danf@17-8-2009 10500580@unknown@formal@none@1@S@*[http://www.dustyant.com/projects/deebot/ Deeb0t]@@@@1@2@@danf@17-8-2009 10500590@unknown@formal@none@1@S@*[http://www.pandorabots.com/pandora/talk?botid=b0dafd24ee35a477 Chomsky] A chatbot that uses a smiley face to convey emotions.@@@@1@12@@danf@17-8-2009 10500600@unknown@formal@none@1@S@It uses the information in Wikipedia to build its conversations and has links to Wikipedia articles.@@@@1@16@@danf@17-8-2009 10500610@unknown@formal@none@1@S@*[[John Lennon Artificial Intelligence Project]]@@@@1@5@@danf@17-8-2009 10500620@unknown@formal@none@1@S@*[[SitePal]]@@@@1@1@@danf@17-8-2009 10500630@unknown@formal@none@1@S@==JFred Chatterbots==@@@@1@2@@danf@17-8-2009 10500640@unknown@formal@none@1@S@*[[The Turing Hub]]@@@@1@3@@danf@17-8-2009 10500650@unknown@formal@none@1@S@==Educational Chatterbots==@@@@1@2@@danf@17-8-2009 10500660@unknown@formal@none@1@S@*[http://www.philocomp.net/?pageref=ai&page=elizabeth Elizabeth] Aims to teach AI techniques and concepts, starting from chatterbot design.@@@@1@13@@danf@17-8-2009 10500670@unknown@formal@none@1@S@Accompanied by self-teaching materials, as used at the University of Leeds.@@@@1@11@@danf@17-8-2009 10500680@unknown@formal@none@1@S@==Non-English Chatterbots==@@@@1@2@@danf@17-8-2009 10500690@unknown@formal@none@1@S@*[http://www.geocities.com/brizglace/amanda.htm Amanda] - (French) with source code for Windows.@@@@1@9@@danf@17-8-2009 10500700@unknown@formal@none@1@S@*[[Proteus]]@@@@1@1@@danf@17-8-2009 10500710@unknown@formal@none@1@S@*[msnim:chat?contact=senhorbot\shotmail.com Senhor Bot] (Brazillian bot for MSN)@@@@1@7@@danf@17-8-2009 10500720@unknown@formal@none@1@S@[[Category:Chatterbots|*]]@@@@1@1@@danf@17-8-2009 10500730@unknown@formal@none@1@S@[[bn:চ্যাটারবটসমূহের তালিকা]]@@@@1@2@@danf@17-8-2009 10510010@unknown@formal@none@1@S@
Loebner prize
@@@@1@2@@danf@17-8-2009 10510020@unknown@formal@none@1@S@The '''Loebner Prize''' is an annual competition that awards prizes to the [[Chatterbot]] considered by the judges to be the most [[Artificial intelligence|humanlike]] of those entered.@@@@1@26@@danf@17-8-2009 10510030@unknown@formal@none@1@S@The format of the competition is that of a standard [[Turing test]].@@@@1@12@@danf@17-8-2009 10510040@unknown@formal@none@1@S@In the Loebner Prize, as in a Turing test, a human judge is faced with two computer screens.@@@@1@18@@danf@17-8-2009 10510050@unknown@formal@none@1@S@One is under the control of a computer, the other is under the control of a human.@@@@1@17@@danf@17-8-2009 10510060@unknown@formal@none@1@S@The judge poses questions to the two screens and receives answers.@@@@1@11@@danf@17-8-2009 10510070@unknown@formal@none@1@S@Based upon the answers, the judge must decide which screen is controlled by the human and which is controlled by the computer program.@@@@1@23@@danf@17-8-2009 10510080@unknown@formal@none@1@S@The contest was begun in 1990 by [[Hugh Loebner]] in conjunction with the [[Cambridge Center for Behavioral Studies]] of [[Massachusetts]], [[United States]].@@@@1@22@@danf@17-8-2009 10510090@unknown@formal@none@1@S@It has since been associated with [[Flinders University]], [[Dartmouth College]], the [[Science Museum (London)|Science Museum]] in [[London]], and most recently the [[University of Reading]].@@@@1@24@@danf@17-8-2009 10510100@unknown@formal@none@1@S@Within the field of artificial intelligence, the Loebner Prize is somewhat controversial; the most prominent critic, [[Marvin Minsky]], has called it a publicity stunt that does not help the field along.@@@@1@31@@danf@17-8-2009 10510110@unknown@formal@none@1@S@==Prizes==@@@@1@1@@danf@17-8-2009 10510120@unknown@formal@none@1@S@The prizes for each year include:@@@@1@6@@danf@17-8-2009 10510130@unknown@formal@none@1@S@* $2,000 for the most human-seeming of all chatterbots for that year - awarded every year.@@@@1@16@@danf@17-8-2009 10510140@unknown@formal@none@1@S@In 2005, the prize was increased to $3,000, and the prize was $2,250 in 2006.@@@@1@15@@danf@17-8-2009 10510150@unknown@formal@none@1@S@In 2008 the prize will be $3000.00@@@@1@7@@danf@17-8-2009 10510160@unknown@formal@none@1@S@* $25,000 for the first chatterbot that judges cannot distinguish from a real human in a text-only Turing test, and that can convince judges that the other (human) entity they are talking to simultaneously is a computer.@@@@1@37@@danf@17-8-2009 10510165@unknown@formal@none@1@S@''(to be awarded once only)''@@@@1@5@@danf@17-8-2009 10510170@unknown@formal@none@1@S@* $100,000 to the first chatterbot that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input.@@@@1@28@@danf@17-8-2009 10510175@unknown@formal@none@1@S@''(to be awarded once only)''@@@@1@5@@danf@17-8-2009 10510180@unknown@formal@none@1@S@The Loebner Prize dissolves once the $100,000 prize is won.@@@@1@10@@danf@17-8-2009 10510190@unknown@formal@none@1@S@==2008 Loebner Prize==@@@@1@3@@danf@17-8-2009 10510200@unknown@formal@none@1@S@The 2008 Competition is to be held on Sunday [[12 October]] in University of Reading, [[United Kingdom|UK]].@@@@1@17@@danf@17-8-2009 10510210@unknown@formal@none@1@S@The event, which is being co-directed by [[Kevin Warwick]], will include a direct challenge on the [[Turing test]] as originally proposed by [[Alan Turing]].@@@@1@24@@danf@17-8-2009 10510220@unknown@formal@none@1@S@The first place winner will receive $3000.00 and a bronze medal.@@@@1@11@@danf@17-8-2009 10510230@unknown@formal@none@1@S@==2007 Loebner Prize==@@@@1@3@@danf@17-8-2009 10510240@unknown@formal@none@1@S@The 2007 Competition was held on Sunday, [[21 October]] in [[New York City]].@@@@1@13@@danf@17-8-2009 10510250@unknown@formal@none@1@S@The participants in the contest were:@@@@1@6@@danf@17-8-2009 10510260@unknown@formal@none@1@S@* [[Rollo Carpenter]] from Icogno, creator of [[Jabberwacky]]@@@@1@8@@danf@17-8-2009 10510270@unknown@formal@none@1@S@* Noah Duncan, private entry, creator of Cletus@@@@1@8@@danf@17-8-2009 10510280@unknown@formal@none@1@S@* Robert Medeksza from Zabaware, creator of [[Ultra Hal Assistant]]@@@@1@10@@danf@17-8-2009 10510290@unknown@formal@none@1@S@No bot passed the Turing test but the judges ranked the bots as "most human".@@@@1@15@@danf@17-8-2009 10510300@unknown@formal@none@1@S@The results of the contest were:@@@@1@6@@danf@17-8-2009 10510310@unknown@formal@none@1@S@* 1st place: Robert Medeksza@@@@1@5@@danf@17-8-2009 10510320@unknown@formal@none@1@S@* 2nd place: Noah Duncan@@@@1@5@@danf@17-8-2009 10510330@unknown@formal@none@1@S@* 3rd place: Rollo Carpenter@@@@1@5@@danf@17-8-2009 10510340@unknown@formal@none@1@S@The winner received $2250 and the Annual Medal.@@@@1@8@@danf@17-8-2009 10510350@unknown@formal@none@1@S@The runners up received $250 each.@@@@1@6@@danf@17-8-2009 10510360@unknown@formal@none@1@S@==2006 Loebner Prize==@@@@1@3@@danf@17-8-2009 10510370@unknown@formal@none@1@S@On Wednesday, [[August 30]], the finalists for the 2006 Loebner Prize were announced.@@@@1@13@@danf@17-8-2009 10510380@unknown@formal@none@1@S@The finalists were:@@@@1@3@@danf@17-8-2009 10510390@unknown@formal@none@1@S@* Rollo Carpenter@@@@1@3@@danf@17-8-2009 10510400@unknown@formal@none@1@S@* Richard Churchill and Marie-Claire Jenkins@@@@1@6@@danf@17-8-2009 10510410@unknown@formal@none@1@S@* Noah Duncan@@@@1@3@@danf@17-8-2009 10510420@unknown@formal@none@1@S@* Robert Medeksza@@@@1@3@@danf@17-8-2009 10510430@unknown@formal@none@1@S@The contest was held on Sunday, [[17 September]] at the Torrington Theatre, [[University College London]].@@@@1@15@@danf@17-8-2009 10510440@unknown@formal@none@1@S@==Winners==@@@@1@1@@danf@17-8-2009 10520010@unknown@formal@none@1@S@
Machine learning
@@@@1@2@@danf@17-8-2009 10520020@unknown@formal@none@1@S@As a broad subfield of [[artificial intelligence]], '''machine learning''' is concerned with the design and development of [[algorithm]]s and techniques that allow computers to "learn".@@@@1@25@@danf@17-8-2009 10520030@unknown@formal@none@1@S@At a general level, there are two types of learning: [[Inductive reasoning|inductive]], and [[Deductive reasoning|deductive]].@@@@1@15@@danf@17-8-2009 10520040@unknown@formal@none@1@S@Inductive machine learning methods extract rules and patterns out of massive data sets.@@@@1@13@@danf@17-8-2009 10520050@unknown@formal@none@1@S@The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods.@@@@1@19@@danf@17-8-2009 10520060@unknown@formal@none@1@S@Hence, machine learning is closely related not only to [[data mining]] and [[statistics]], but also [[theoretical computer science]].@@@@1@18@@danf@17-8-2009 10520070@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10520080@unknown@formal@none@1@S@Machine learning has a wide spectrum of applications including [[natural language processing]], [[syntactic pattern recognition]], [[search engines]], [[diagnosis|medical diagnosis]], [[bioinformatics]], [[brain-machine interfaces]] and [[cheminformatics]], detecting [[credit card fraud]], [[stock market]] analysis, classifying [[DNA sequence]]s, [[speech recognition|speech]] and [[handwriting recognition]], [[object recognition]] in [[computer vision]], [[strategy game|game playing]] and [[robot locomotion]].@@@@1@50@@danf@17-8-2009 10520090@unknown@formal@none@1@S@== Human interaction ==@@@@1@4@@danf@17-8-2009 10520100@unknown@formal@none@1@S@Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine.@@@@1@28@@danf@17-8-2009 10520110@unknown@formal@none@1@S@Human intuition cannot be entirely eliminated since the designer of the system must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.@@@@1@35@@danf@17-8-2009 10520120@unknown@formal@none@1@S@Machine learning can be viewed as an attempt to automate parts of the [[scientific method]].@@@@1@15@@danf@17-8-2009 10520130@unknown@formal@none@1@S@Some statistical machine learning researchers create methods within the framework of [[Bayesian statistics]].@@@@1@13@@danf@17-8-2009 10520140@unknown@formal@none@1@S@== Algorithm types ==@@@@1@4@@danf@17-8-2009 10520150@unknown@formal@none@1@S@Machine learning [[algorithm]]s are organized into a [[taxonomy]], based on the desired outcome of the algorithm.@@@@1@16@@danf@17-8-2009 10520160@unknown@formal@none@1@S@Common algorithm types include:@@@@1@4@@danf@17-8-2009 10520170@unknown@formal@none@1@S@* [[Supervised learning]] — in which the algorithm generates a function that maps inputs to desired outputs.@@@@1@17@@danf@17-8-2009 10520180@unknown@formal@none@1@S@One standard formulation of the supervised learning task is the [[statistical classification|classification]] problem: the learner is required to learn (to approximate) the behavior of a function which maps a vector [X_1, X_2, \\ldots X_N]\\, into one of several classes by looking at several input-output examples of the function.@@@@1@48@@danf@17-8-2009 10520190@unknown@formal@none@1@S@* [[Unsupervised learning]] — An agent which models a set of inputs: labeled examples are not available.@@@@1@17@@danf@17-8-2009 10520200@unknown@formal@none@1@S@* [[Semi-supervised learning]] — which combines both labeled and unlabeled examples to generate an appropriate function or classifier.@@@@1@18@@danf@17-8-2009 10520210@unknown@formal@none@1@S@* [[Reinforcement learning]] — in which the algorithm learns a policy of how to act given an observation of the world.@@@@1@21@@danf@17-8-2009 10520220@unknown@formal@none@1@S@Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.@@@@1@18@@danf@17-8-2009 10520230@unknown@formal@none@1@S@* [[Transduction (machine learning)|Transduction]] — similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and test inputs which are available while training.@@@@1@36@@danf@17-8-2009 10520240@unknown@formal@none@1@S@* [[Leaning to learn]] — in which the algorithm learns its own [[inductive bias]] based on previous experience.@@@@1@18@@danf@17-8-2009 10520250@unknown@formal@none@1@S@The computational analysis of machine learning algorithms and their performance is a branch of [[theoretical computer science]] known as [[computational learning theory]].@@@@1@22@@danf@17-8-2009 10520260@unknown@formal@none@1@S@== Machine learning topics ==@@@@1@5@@danf@17-8-2009 10520270@unknown@formal@none@1@S@:''This list represents the topics covered on a typical machine learning course.''@@@@1@12@@danf@17-8-2009 10520280@unknown@formal@none@1@S@;Prerequisites@@@@1@1@@danf@17-8-2009 10520290@unknown@formal@none@1@S@*[[Bayesian theory]]@@@@1@2@@danf@17-8-2009 10520300@unknown@formal@none@1@S@;Modeling [[conditional probability|conditional probability density functions]]: [[Regression analysis|regression]] and [[Statistical classification|classification]]@@@@1@12@@danf@17-8-2009 10520310@unknown@formal@none@1@S@*[[Artificial neural network]]s@@@@1@3@@danf@17-8-2009 10520320@unknown@formal@none@1@S@*[[Decision tree]]s@@@@1@2@@danf@17-8-2009 10520330@unknown@formal@none@1@S@*[[Gene expression programming]]@@@@1@3@@danf@17-8-2009 10520340@unknown@formal@none@1@S@*[[Genetic algorithms]]@@@@1@2@@danf@17-8-2009 10520350@unknown@formal@none@1@S@*[[Genetic programming]]@@@@1@2@@danf@17-8-2009 10520360@unknown@formal@none@1@S@*[[Holographic associative memory]]@@@@1@3@@danf@17-8-2009 10520370@unknown@formal@none@1@S@*[[Inductive Logic Programming]]@@@@1@3@@danf@17-8-2009 10520380@unknown@formal@none@1@S@*[[Kriging|Gaussian process regression]]@@@@1@3@@danf@17-8-2009 10520390@unknown@formal@none@1@S@*[[Linear discriminant analysis]]@@@@1@3@@danf@17-8-2009 10520400@unknown@formal@none@1@S@*[[Nearest neighbor (pattern recognition)|K-nearest neighbor]]@@@@1@5@@danf@17-8-2009 10520410@unknown@formal@none@1@S@*[[Minimum message length]]@@@@1@3@@danf@17-8-2009 10520420@unknown@formal@none@1@S@*[[Perceptron]]@@@@1@1@@danf@17-8-2009 10520430@unknown@formal@none@1@S@*[[Quadratic classifier]]@@@@1@2@@danf@17-8-2009 10520440@unknown@formal@none@1@S@*[[Radial basis function network]]s@@@@1@4@@danf@17-8-2009 10520450@unknown@formal@none@1@S@*[[Support vector machine]]s@@@@1@3@@danf@17-8-2009 10520460@unknown@formal@none@1@S@;Algorithms for estimating model parameters:@@@@1@5@@danf@17-8-2009 10520470@unknown@formal@none@1@S@*[[Dynamic programming]]@@@@1@2@@danf@17-8-2009 10520480@unknown@formal@none@1@S@*[[Expectation-maximization algorithm]]@@@@1@2@@danf@17-8-2009 10520490@unknown@formal@none@1@S@;Modeling [[probability density function]]s through [[generative model]]s:@@@@1@7@@danf@17-8-2009 10520500@unknown@formal@none@1@S@*[[Graphical model]]s including [[Bayesian network]]s and [[Markov network|Markov random fields]]@@@@1@10@@danf@17-8-2009 10520510@unknown@formal@none@1@S@*[[Generative topographic map]]@@@@1@3@@danf@17-8-2009 10520520@unknown@formal@none@1@S@;Approximate inference techniques@@@@1@3@@danf@17-8-2009 10520530@unknown@formal@none@1@S@*[[Monte Carlo method]]s@@@@1@3@@danf@17-8-2009 10520540@unknown@formal@none@1@S@*[[Variational Bayes]]@@@@1@2@@danf@17-8-2009 10520550@unknown@formal@none@1@S@*[[Variable-order Markov model]]s@@@@1@3@@danf@17-8-2009 10520560@unknown@formal@none@1@S@*[[Variable-order Bayesian network]]s@@@@1@3@@danf@17-8-2009 10520570@unknown@formal@none@1@S@*[[Loopy belief propagation]]@@@@1@3@@danf@17-8-2009 10520580@unknown@formal@none@1@S@;Optimization@@@@1@1@@danf@17-8-2009 10520590@unknown@formal@none@1@S@*Most of methods listed above either use [[Optimization (mathematics)|optimization]] or are instances of optimization algorithms@@@@1@15@@danf@17-8-2009 10520600@unknown@formal@none@1@S@;Meta-learning (ensemble methods)@@@@1@3@@danf@17-8-2009 10520610@unknown@formal@none@1@S@*[[Boosting]]@@@@1@1@@danf@17-8-2009 10520620@unknown@formal@none@1@S@*[[Bootstrap aggregating]]@@@@1@2@@danf@17-8-2009 10520630@unknown@formal@none@1@S@*[[Random forest]]@@@@1@2@@danf@17-8-2009 10520640@unknown@formal@none@1@S@*[[Weighted majority algorithm]]@@@@1@3@@danf@17-8-2009 10520650@unknown@formal@none@1@S@;Inductive transfer and learning to learn@@@@1@6@@danf@17-8-2009 10520660@unknown@formal@none@1@S@*[[Inductive transfer]]@@@@1@2@@danf@17-8-2009 10520670@unknown@formal@none@1@S@*[[Reinforcement learning]]@@@@1@2@@danf@17-8-2009 10520680@unknown@formal@none@1@S@*[[Temporal difference learning]]@@@@1@3@@danf@17-8-2009 10520690@unknown@formal@none@1@S@*[[Monte-Carlo method]]@@@@1@2@@danf@17-8-2009 10530010@unknown@formal@none@1@S@
Machine translation
@@@@1@2@@danf@17-8-2009 10530020@unknown@formal@none@1@S@Machine translation''', sometimes referred to by the abbreviation '''MT''', is a sub-field of [[computational linguistics]] that investigates the use of [[computer software]] to [[translation|translate]] text or speech from one [[natural language]] to another.@@@@1@33@@danf@17-8-2009 10530030@unknown@formal@none@1@S@At its basic level, MT performs simple [[substitution]] of words in one natural language for words in another.@@@@1@18@@danf@17-8-2009 10530040@unknown@formal@none@1@S@Using [[corpus linguistics|corpus]] techniques, more complex translations may be attempted, allowing for better handling of differences in [[linguistic typology]], phrase [[recognition]], and translation of [[idiom]]s, as well as the isolation of anomalies.@@@@1@32@@danf@17-8-2009 10530050@unknown@formal@none@1@S@Current machine translation software often allows for customisation by domain or [[profession]] (such as [[meteorology|weather reports]]) — improving output by limiting the scope of allowable substitutions.@@@@1@26@@danf@17-8-2009 10530060@unknown@formal@none@1@S@This technique is particularly effective in domains where formal or formulaic language is used.@@@@1@14@@danf@17-8-2009 10530070@unknown@formal@none@1@S@It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.@@@@1@22@@danf@17-8-2009 10530080@unknown@formal@none@1@S@Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has [[word sense disambiguation|unambiguously identified]] which words in the text are names.@@@@1@35@@danf@17-8-2009 10530090@unknown@formal@none@1@S@With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is".@@@@1@31@@danf@17-8-2009 10530100@unknown@formal@none@1@S@However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.@@@@1@26@@danf@17-8-2009 10530110@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10530120@unknown@formal@none@1@S@The history of machine translation begins in the 1950s, after [[World War II]].@@@@1@13@@danf@17-8-2009 10530130@unknown@formal@none@1@S@The [[Georgetown-IBM experiment|Georgetown experiment]] (1954) involved fully-automatic translation of over sixty [[Russian language|Russian]] sentences into [[English language|English]].@@@@1@17@@danf@17-8-2009 10530140@unknown@formal@none@1@S@The experiment was a great success and ushered in an era of substantial funding for machine-translation research.@@@@1@17@@danf@17-8-2009 10530150@unknown@formal@none@1@S@The authors claimed that within three to five years, machine translation would be a solved problem.@@@@1@16@@danf@17-8-2009 10530160@unknown@formal@none@1@S@Real progress was much slower, however, and after the [[ALPAC|ALPAC report]] (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced.@@@@1@27@@danf@17-8-2009 10530170@unknown@formal@none@1@S@Beginning in the late 1980s, as [[computation]]al power increased and became less expensive, more interest was shown in [[statistical machine translation|statistical models for machine translation]].@@@@1@25@@danf@17-8-2009 10530180@unknown@formal@none@1@S@The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A.D.Booth and possibly others.@@@@1@22@@danf@17-8-2009 10530190@unknown@formal@none@1@S@The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (London Univ.) of a rudimentary translation of English into French.@@@@1@35@@danf@17-8-2009 10530200@unknown@formal@none@1@S@Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov).@@@@1@26@@danf@17-8-2009 10530210@unknown@formal@none@1@S@A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.@@@@1@19@@danf@17-8-2009 10530220@unknown@formal@none@1@S@Recently, Internet has emerged as global information infrastructure, revolutionizing access to any information, as well as fast information transfer and exchange.@@@@1@21@@danf@17-8-2009 10530230@unknown@formal@none@1@S@Using Internet and e-mail technology, people need to communicate rapidly over long distances across continent boundaries.@@@@1@16@@danf@17-8-2009 10530240@unknown@formal@none@1@S@Not all of these Internet users, however, can use their own language for global communication to different people with different languages.@@@@1@21@@danf@17-8-2009 10530250@unknown@formal@none@1@S@Therefore, using machine translation software, people can possibly communicate and contact one to another around the world in their own mother tongue, in the near future.@@@@1@26@@danf@17-8-2009 10530260@unknown@formal@none@1@S@==Translation process==@@@@1@2@@danf@17-8-2009 10530270@unknown@formal@none@1@S@The [[translation process]] may be stated as:@@@@1@7@@danf@17-8-2009 10530280@unknown@formal@none@1@S@# [[Decoding]] the [[meaning (linguistic)|meaning]] of the [[source text]]; and@@@@1@10@@danf@17-8-2009 10530290@unknown@formal@none@1@S@# Re-[[encoding]] this [[meaning (linguistic)|meaning]] in the [[target language]].@@@@1@9@@danf@17-8-2009 10530300@unknown@formal@none@1@S@Behind this ostensibly simple procedure lies a complex [[cognitive]] operation.@@@@1@10@@danf@17-8-2009 10530310@unknown@formal@none@1@S@To decode the meaning of the [[source text]] in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the [[grammar]], [[semantics]], [[syntax]], [[idiom]]s, etc., of the [[source language]], as well as the [[culture]] of its speakers.@@@@1@48@@danf@17-8-2009 10530320@unknown@formal@none@1@S@The translator needs the same in-depth knowledge to re-encode the meaning in the [[target language]].@@@@1@15@@danf@17-8-2009 10530330@unknown@formal@none@1@S@Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the [[target language]] that "sounds" as if it has been written by a person.@@@@1@43@@danf@17-8-2009 10530340@unknown@formal@none@1@S@This problem may be approached in a number of ways.@@@@1@10@@danf@17-8-2009 10530350@unknown@formal@none@1@S@==Approaches==@@@@1@1@@danf@17-8-2009 10530360@unknown@formal@none@1@S@Machine translation can use a method based on [[Expert System|linguistic rules]], which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.@@@@1@41@@danf@17-8-2009 10530370@unknown@formal@none@1@S@It is often argued that the success of machine translation requires the problem of [[natural language processing|natural language understanding]] to be solved first.@@@@1@23@@danf@17-8-2009 10530380@unknown@formal@none@1@S@Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated.@@@@1@22@@danf@17-8-2009 10530390@unknown@formal@none@1@S@According to the nature of the intermediary representation, an approach is described as [[interlingual machine translation]] or [[transfer-based machine translation]].@@@@1@20@@danf@17-8-2009 10530400@unknown@formal@none@1@S@These methods require extensive [[lexicon]]s with [[morphology (linguistics)|morphological]], [[syntax|syntactic]], and [[semantics|semantic]] information, and large sets of rules.@@@@1@17@@danf@17-8-2009 10530410@unknown@formal@none@1@S@Given enough data, machine translation programs often work well enough for a [[native speaker]] of one language to get the approximate meaning of what is written by the other native speaker.@@@@1@31@@danf@17-8-2009 10530420@unknown@formal@none@1@S@The difficulty is getting enough data of the right kind to support the particular method.@@@@1@15@@danf@17-8-2009 10530430@unknown@formal@none@1@S@For example, the large multilingual [[Text corpus|corpus]] of data needed for statistical methods to work is not necessary for the grammar-based methods.@@@@1@22@@danf@17-8-2009 10530440@unknown@formal@none@1@S@But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.@@@@1@17@@danf@17-8-2009 10530450@unknown@formal@none@1@S@To translate between closely related languages, a technique referred to as [[shallow-transfer machine translation]] may be used.@@@@1@17@@danf@17-8-2009 10530460@unknown@formal@none@1@S@===Rule-based===@@@@1@1@@danf@17-8-2009 10530470@unknown@formal@none@1@S@The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.@@@@1@17@@danf@17-8-2009 10530480@unknown@formal@none@1@S@'''''Transfer-based machine translation'''''@@@@1@3@@danf@17-8-2009 10530490@unknown@formal@none@1@S@'''''Interlingual'''''@@@@1@1@@danf@17-8-2009 10530500@unknown@formal@none@1@S@Interlingual machine translation is one instance of rule-based machine-translation approaches.@@@@1@10@@danf@17-8-2009 10530510@unknown@formal@none@1@S@In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation.@@@@1@20@@danf@17-8-2009 10530520@unknown@formal@none@1@S@The target language is then generated out of the [[interlinguistics|interlingua]].@@@@1@10@@danf@17-8-2009 10530530@unknown@formal@none@1@S@'''''Dictionary-based'''''@@@@1@1@@danf@17-8-2009 10530540@unknown@formal@none@1@S@Machine translation can use a method based on [[dictionary]] entries, which means that the words will be translated as they are by a dictionary.@@@@1@24@@danf@17-8-2009 10530550@unknown@formal@none@1@S@===Statistical===@@@@1@1@@danf@17-8-2009 10530560@unknown@formal@none@1@S@Statistical machine translation tries to generate translations using [[statistical methods]] based on bilingual text corpora, such as the [[Hansard#Canadian hansard and machine translation|Canadian Hansard]] corpus, the English-French record of the Canadian parliament and [[EUROPARL]], the record of the [[European Parliament]].@@@@1@40@@danf@17-8-2009 10530570@unknown@formal@none@1@S@Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare.@@@@1@23@@danf@17-8-2009 10530580@unknown@formal@none@1@S@The first statistical machine translation software was [[CANDIDE]] from [[IBM]].@@@@1@10@@danf@17-8-2009 10530590@unknown@formal@none@1@S@Google used [[SYSTRAN]] for several years, but has switched to a statistical translation method in October 2007.@@@@1@17@@danf@17-8-2009 10530600@unknown@formal@none@1@S@Recently, they improved their translation capabilities by inputting approximately 200 billion words from [[United Nations]] materials to train their system.@@@@1@20@@danf@17-8-2009 10530610@unknown@formal@none@1@S@Accuracy of the translation has improved.@@@@1@6@@danf@17-8-2009 10530620@unknown@formal@none@1@S@===Example-based===@@@@1@1@@danf@17-8-2009 10530630@unknown@formal@none@1@S@Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual [[corpus]] as its main knowledge base, at run-time.@@@@1@22@@danf@17-8-2009 10530640@unknown@formal@none@1@S@It is essentially a translation by [[analogy]] and can be viewed as an implementation of [[case-based reasoning]] approach of [[machine learning]].@@@@1@21@@danf@17-8-2009 10530650@unknown@formal@none@1@S@==Major issues==@@@@1@2@@danf@17-8-2009 10530660@unknown@formal@none@1@S@===Disambiguation===@@@@1@1@@danf@17-8-2009 10530670@unknown@formal@none@1@S@Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning.@@@@1@17@@danf@17-8-2009 10530680@unknown@formal@none@1@S@The problem was first raised in the 1950s by [[Yehoshua Bar-Hillel]].@@@@1@11@@danf@17-8-2009 10530690@unknown@formal@none@1@S@He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.@@@@1@23@@danf@17-8-2009 10530700@unknown@formal@none@1@S@Today there are numerous approaches designed to overcome this problem.@@@@1@10@@danf@17-8-2009 10530710@unknown@formal@none@1@S@They can be approximately divided into "shallow" approaches and "deep" approaches.@@@@1@11@@danf@17-8-2009 10530720@unknown@formal@none@1@S@Shallow approaches assume no knowledge of the text.@@@@1@8@@danf@17-8-2009 10530730@unknown@formal@none@1@S@They simply apply statistical methods to the words surrounding the ambiguous word.@@@@1@12@@danf@17-8-2009 10530740@unknown@formal@none@1@S@Deep approaches presume a comprehensive knowledge of the word.@@@@1@9@@danf@17-8-2009 10530750@unknown@formal@none@1@S@So far, shallow approaches have been more successful.@@@@1@8@@danf@17-8-2009 10530760@unknown@formal@none@1@S@===Named entities===@@@@1@2@@danf@17-8-2009 10530770@unknown@formal@none@1@S@Related to [[named entity recognition]] in [[information extraction]].@@@@1@8@@danf@17-8-2009 10530780@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10530790@unknown@formal@none@1@S@There are now many [[software]] programs for translating natural language, several of them [[online]], such as the [[SYSTRAN]] system which powers both [[Google]] translate and [[AltaVista]]'s [[Babel Fish (website)|Babel Fish]] as well as [[Promt]] that powers online translation services at Voila.fr and Orange.fr.@@@@1@43@@danf@17-8-2009 10530800@unknown@formal@none@1@S@Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output.@@@@1@20@@danf@17-8-2009 10530810@unknown@formal@none@1@S@Despite their inherent limitations, MT programs are used around the world.@@@@1@11@@danf@17-8-2009 10530820@unknown@formal@none@1@S@Probably the largest institutional user is the [[European Commission]].@@@@1@9@@danf@17-8-2009 10530830@unknown@formal@none@1@S@[[Toggletext]] uses a transfer-based system (known as Kataku) to translate between [[English language|English]] and [[Indonesian language|Indonesian]].@@@@1@16@@danf@17-8-2009 10530840@unknown@formal@none@1@S@[[Google]] has claimed that promising results were obtained using a proprietary statistical machine translation engine.@@@@1@15@@danf@17-8-2009 10530850@unknown@formal@none@1@S@The statistical translation engine used in the [[Google tools#anchor_language_tools|Google language tools]] for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.@@@@1@46@@danf@17-8-2009 10530860@unknown@formal@none@1@S@[[Uwe Muegge]] has implemented a demo website that uses a [[controlled language]] in combination with the [[Google tools#anchor_language_tools|Google tool]] to produce fully automatic, high-quality machine translations of his English, German, and French web sites.@@@@1@34@@danf@17-8-2009 10530870@unknown@formal@none@1@S@With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering.@@@@1@24@@danf@17-8-2009 10530880@unknown@formal@none@1@S@''In-Q-Tel'' (a [[venture capital]] fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like [[Language Weaver]].@@@@1@26@@danf@17-8-2009 10530890@unknown@formal@none@1@S@Currently the military community is interested in translation and processing of languages like [[Arabic language|Arabic]], [[Pashto language|Pashto]], and [[Dari language|Dari]].@@@@1@20@@danf@17-8-2009 10530900@unknown@formal@none@1@S@Information Processing Technology Office in [[DARPA]] hosts programs like [[DARPA TIDES program|TIDES]] and [[Babylon translator|Babylon Translator]].@@@@1@16@@danf@17-8-2009 10530910@unknown@formal@none@1@S@US Air Force has awarded a $1 million contract to develop a language translation technology.@@@@1@15@@danf@17-8-2009 10530920@unknown@formal@none@1@S@== Evaluation ==@@@@1@3@@danf@17-8-2009 10530930@unknown@formal@none@1@S@There are various means for evaluating the performance of machine-translation systems.@@@@1@11@@danf@17-8-2009 10530940@unknown@formal@none@1@S@The oldest is the use of human judges to assess a translation's quality.@@@@1@13@@danf@17-8-2009 10530950@unknown@formal@none@1@S@Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems.@@@@1@23@@danf@17-8-2009 10530960@unknown@formal@none@1@S@[[Automate]]d means of evaluation include [[Bilingual evaluation understudy|BLEU]], [[NIST (metric)|NIST]] and [[METEOR]].@@@@1@12@@danf@17-8-2009 10530970@unknown@formal@none@1@S@Relying exclusively on machine translation ignores that communication in [[natural language|human language]] is [[wiktionary:context|context]]-embedded, and that it takes a human to adequately comprehend the context of the original text.@@@@1@29@@danf@17-8-2009 10530980@unknown@formal@none@1@S@Even purely human-generated translations are prone to error.@@@@1@8@@danf@17-8-2009 10530990@unknown@formal@none@1@S@Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human.@@@@1@26@@danf@17-8-2009 10531000@unknown@formal@none@1@S@It has, however, been asserted that in certain applications, e.g. product descriptions written in a [[controlled language]], a [[dictionary-based machine translation|dictionary-based machine-translation]] system has produced satisfactory translations that require no human intervention.@@@@1@32@@danf@17-8-2009 10540010@unknown@formal@none@1@S@
Metadata
@@@@1@1@@danf@17-8-2009 10540020@unknown@formal@none@1@S@'''Metadata''' ('''meta data''', or sometimes '''metainformation''') is "data about data", of any sort in any media.@@@@1@16@@danf@17-8-2009 10540030@unknown@formal@none@1@S@An item of metadata may describe an individual [[datum]], or content item, or a collection of data including multiple content items and hierarchical levels, for example a [[database schema]].@@@@1@29@@danf@17-8-2009 10540040@unknown@formal@none@1@S@== Purpose ==@@@@1@3@@danf@17-8-2009 10540050@unknown@formal@none@1@S@Metadata provides context for data.@@@@1@5@@danf@17-8-2009 10540060@unknown@formal@none@1@S@Metadata is used to facilitate the understanding, characteristics, and management usage of data.@@@@1@13@@danf@17-8-2009 10540070@unknown@formal@none@1@S@The metadata required for effective data management varies with the type of data and context of use.@@@@1@17@@danf@17-8-2009 10540080@unknown@formal@none@1@S@In a [[library]], where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the [[author]], the publication date and the physical location.@@@@1@34@@danf@17-8-2009 10540090@unknown@formal@none@1@S@== Examples of Metadata ==@@@@1@5@@danf@17-8-2009 10540100@unknown@formal@none@1@S@=== Camera ===@@@@1@3@@danf@17-8-2009 10540110@unknown@formal@none@1@S@In the context of a [[camera]], where the data is the photographic image, metadata would typically include the date the [[photograph]] was taken and details of the camera settings (lens, focal length, aperture, shutter timing, white balance, etc.).@@@@1@38@@danf@17-8-2009 10540120@unknown@formal@none@1@S@=== Digital Music Player ===@@@@1@5@@danf@17-8-2009 10540130@unknown@formal@none@1@S@On a digital portable music player, the album names, song titles and album art embedded in the music files are used to generate the artist and song listings, and are considered the metadata.@@@@1@33@@danf@17-8-2009 10540140@unknown@formal@none@1@S@=== Information system ===@@@@1@4@@danf@17-8-2009 10540150@unknown@formal@none@1@S@In the context of an [[information system]], where the data is the content of the [[computer]] files, metadata about an individual data item would typically include the name of the field and its length.@@@@1@34@@danf@17-8-2009 10540160@unknown@formal@none@1@S@Metadata about a collection of data items, a computer file, might typically include the name of the file, the type of file and the name of the data administrator.@@@@1@29@@danf@17-8-2009 10540170@unknown@formal@none@1@S@''Italic text''@@@@1@2@@danf@17-8-2009 10540180@unknown@formal@none@1@S@=== Real world location ===@@@@1@5@@danf@17-8-2009 10540190@unknown@formal@none@1@S@If we consider a particular place in the real world, this may be described by data, for example:@@@@1@18@@danf@17-8-2009 10540200@unknown@formal@none@1@S@* 1 "E83BJ" .@@@@1@4@@danf@17-8-2009 10540210@unknown@formal@none@1@S@* 2 "17"@@@@1@3@@danf@17-8-2009 10540220@unknown@formal@none@1@S@* 3 "Sunny"@@@@1@3@@danf@17-8-2009 10540230@unknown@formal@none@1@S@To make sense of and use this data, context is important, and can be provided by metadata.@@@@1@17@@danf@17-8-2009 10540240@unknown@formal@none@1@S@The metadata for the above three items of data might include:@@@@1@11@@danf@17-8-2009 10540250@unknown@formal@none@1@S@* 1.1 "Post Code" – This is a brief description (or name) of the data item "E83BJ"@@@@1@17@@danf@17-8-2009 10540260@unknown@formal@none@1@S@* 1.2 "The unique identifier of a postal district" – This is another description (a definition) of "E83BJ"@@@@1@18@@danf@17-8-2009 10540270@unknown@formal@none@1@S@* 1.3 "27 June 2006" – This could also help describe "E83BJ", for example by giving the date it was last updated@@@@1@22@@danf@17-8-2009 10540280@unknown@formal@none@1@S@* 2 "Average temperature in degrees Celsius" – This is a possible description of "17"@@@@1@15@@danf@17-8-2009 10540290@unknown@formal@none@1@S@* 3 "Yesterday's weather" – This is a description of "sunny"@@@@1@11@@danf@17-8-2009 10540300@unknown@formal@none@1@S@An item of metadata is itself data and therefore may have its own metadata.@@@@1@14@@danf@17-8-2009 10540310@unknown@formal@none@1@S@For example, "Post Code" might have the following metadata:@@@@1@9@@danf@17-8-2009 10540320@unknown@formal@none@1@S@* 1.1.1 "data item name"@@@@1@5@@danf@17-8-2009 10540330@unknown@formal@none@1@S@* 1.1.2 "5 characters, starting with A – Z"@@@@1@9@@danf@17-8-2009 10540340@unknown@formal@none@1@S@"27 June 2006" might have the following metadata:@@@@1@8@@danf@17-8-2009 10540350@unknown@formal@none@1@S@* 1.3.1 "date last changed"@@@@1@5@@danf@17-8-2009 10540360@unknown@formal@none@1@S@* 1.3.2 "dd MMM yyyy"@@@@1@5@@danf@17-8-2009 10540370@unknown@formal@none@1@S@== Levels ==@@@@1@3@@danf@17-8-2009 10540380@unknown@formal@none@1@S@The hierarchy of metadata descriptions can go on forever, but usually context or semantic understanding makes extensively detailed explanations unnecessary.@@@@1@20@@danf@17-8-2009 10540390@unknown@formal@none@1@S@The role played by any particular [[datum]] depends on the context.@@@@1@11@@danf@17-8-2009 10540400@unknown@formal@none@1@S@For example, when considering the geography of London, "E83BJ" would be a datum and "Post Code" would be metadatum.@@@@1@19@@danf@17-8-2009 10540410@unknown@formal@none@1@S@But, when considering the data management of an automated system that manages geographical data, "Post Code" might be a datum and then "data item name" and "5 characters, starting with A – Z" would be metadata.@@@@1@36@@danf@17-8-2009 10540420@unknown@formal@none@1@S@In any particular context, metadata characterizes the data it describes, not the entity described by that data.@@@@1@17@@danf@17-8-2009 10540430@unknown@formal@none@1@S@So, in relation to "E83BJ", the datum "is in London" is a further description of the place in the real world which has the post code "E83BJ", not of the code itself.@@@@1@32@@danf@17-8-2009 10540440@unknown@formal@none@1@S@Therefore, although it is providing information connected to "E83BJ" (telling us that this is the post code of a place in London), this would not normally be considered metadata, as it is describing "E83BJ" ''qua'' place in the real world and not ''qua'' data.@@@@1@44@@danf@17-8-2009 10540450@unknown@formal@none@1@S@== Definitions ==@@@@1@3@@danf@17-8-2009 10540460@unknown@formal@none@1@S@=== Etymology ===@@@@1@3@@danf@17-8-2009 10540470@unknown@formal@none@1@S@[[Meta]] is a classical Greek preposition (μετ’ αλλων εταιρων) and prefix (μεταβασις) conveying the following senses in English, depending upon the case of the associated noun: among; along with; with; by means of; in the midst of; after; behind.@@@@1@39@@danf@17-8-2009 10540480@unknown@formal@none@1@S@In [[epistemology]], the word means "about (its own category)"; thus metadata is "data about the data".@@@@1@16@@danf@17-8-2009 10540490@unknown@formal@none@1@S@=== Varying definitions ===@@@@1@4@@danf@17-8-2009 10540500@unknown@formal@none@1@S@The term was introduced intuitively, without a formal definition.@@@@1@9@@danf@17-8-2009 10540510@unknown@formal@none@1@S@Because of that, today there are various definitions.@@@@1@8@@danf@17-8-2009 10540520@unknown@formal@none@1@S@The most common one is the literal translation:@@@@1@8@@danf@17-8-2009 10540530@unknown@formal@none@1@S@* "Data about data are referred to as metadata."@@@@1@9@@danf@17-8-2009 10540540@unknown@formal@none@1@S@Example: "12345" is data, and with no additional context is meaningless.@@@@1@11@@danf@17-8-2009 10540550@unknown@formal@none@1@S@When "12345" is given a meaningful name (metadata) of "[[ZIP code]]", one can understand (at least in the [[United States]], and further placing "ZIP code" within the context of a [[postal address]]) that "12345" refers to the [[General Electric]] plant in [[Schenectady, New York]].@@@@1@44@@danf@17-8-2009 10540560@unknown@formal@none@1@S@As for most people the difference between data and [[information]] is merely a [[philosophical]] one of no relevance in practical use, other definitions are:@@@@1@24@@danf@17-8-2009 10540570@unknown@formal@none@1@S@* Metadata is information about data.@@@@1@6@@danf@17-8-2009 10540580@unknown@formal@none@1@S@* Metadata is information about information.@@@@1@6@@danf@17-8-2009 10540590@unknown@formal@none@1@S@* Metadata contains information about that data or other data@@@@1@10@@danf@17-8-2009 10540600@unknown@formal@none@1@S@There are more sophisticated definitions, such as:@@@@1@7@@danf@17-8-2009 10540610@unknown@formal@none@1@S@*"Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities."@@@@1@24@@danf@17-8-2009 10540620@unknown@formal@none@1@S@* "[Metadata is a set of] optional structured descriptions that are publicly available to explicitly assist in locating objects."@@@@1@19@@danf@17-8-2009 10540630@unknown@formal@none@1@S@These are used more rarely because they tend to concentrate on one purpose of metadata — to find "objects", "entities" or "resources" — and ignore others, such as using metadata to optimize [[data compression|compression algorithms]], or to perform additional computations using the data.@@@@1@43@@danf@17-8-2009 10540640@unknown@formal@none@1@S@The metadata concept has been extended into the world of systems to include any "data about data": the names of tables, columns, programs, and the like.@@@@1@26@@danf@17-8-2009 10540650@unknown@formal@none@1@S@Different views of this "system metadata" are detailed below, but beyond that is the recognition that metadata can describe all aspects of systems: data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules.@@@@1@46@@danf@17-8-2009 10540660@unknown@formal@none@1@S@Fundamentally, then, metadata is "the data that describe the structure and workings of an organization's use of information, and which describe the systems it uses to manage that information".@@@@1@29@@danf@17-8-2009 10540670@unknown@formal@none@1@S@To do a model of metadata is to do an "[[Enterprise modeling|Enterprise model]]" of the information technology industry itself.@@@@1@19@@danf@17-8-2009 10540680@unknown@formal@none@1@S@=== Metadata and Markup ===@@@@1@5@@danf@17-8-2009 10540690@unknown@formal@none@1@S@In the context of the web and the work of the [[W3C]] in providing markup technologies of [[HTML]], [[XML]] and [[SGML]] the concept of metadata has specific context that is perhaps clearer than in other information domains.@@@@1@37@@danf@17-8-2009 10540700@unknown@formal@none@1@S@With markup technologies there is metadata, markup and data content.@@@@1@10@@danf@17-8-2009 10540710@unknown@formal@none@1@S@The metadata describes characteristics about the data, while the markup identifies the specific type of data content and acts as a container for that document instance.@@@@1@26@@danf@17-8-2009 10540720@unknown@formal@none@1@S@This page in Wikipedia is itself an example of such usage, where the textual information is data, how it is packaged, linked, referenced, styled and displayed is markup and aspects and characteristics of that markup are metadata set globally across Wikipedia.@@@@1@41@@danf@17-8-2009 10540730@unknown@formal@none@1@S@In the context of markup the metadata is architected to allow optimization of document instances to contain only a minimum amount of metadata, while the metadata itself is likely referenced externally such as in a [[schema]] definition ([[XSD]]) instance.@@@@1@39@@danf@17-8-2009 10540740@unknown@formal@none@1@S@Also it should be noted that markup provides specialised mechanisms that handle referential data, again avoiding confusion over what is metadata or data, and allowing optimizations.@@@@1@26@@danf@17-8-2009 10540750@unknown@formal@none@1@S@The reference and ID mechanisms in markup allowing reference links between related data items, and links to data items that can then be repeated about a data item, such as an address or product details.@@@@1@35@@danf@17-8-2009 10540760@unknown@formal@none@1@S@These are then all themselves simply more data items and markup instances rather than metadata.@@@@1@15@@danf@17-8-2009 10540770@unknown@formal@none@1@S@Similarly there are concepts such as classifications, ontologies and associations for which markup mechanisms are provided.@@@@1@16@@danf@17-8-2009 10540780@unknown@formal@none@1@S@A data item can then be linked to such categories via markup and hence providing a clean delineation between what is metadata, and actual data instances.@@@@1@26@@danf@17-8-2009 10540790@unknown@formal@none@1@S@Therefore the concepts and descriptions in a classification would be metadata, but the actual classification entry for a data item is simply another data instance.@@@@1@25@@danf@17-8-2009 10540800@unknown@formal@none@1@S@Some examples can illustrate the points here.@@@@1@7@@danf@17-8-2009 10540810@unknown@formal@none@1@S@Items in bold are data content, in italic are metadata, normal text items are all markup.@@@@1@16@@danf@17-8-2009 10540820@unknown@formal@none@1@S@The two examples show in-line use of metadata within markup relating to a data instance (XML) compared to simple markup (HTML).@@@@1@21@@danf@17-8-2009 10540830@unknown@formal@none@1@S@A simple [[HTML]] instance example:@@@@1@5@@danf@17-8-2009 10540840@unknown@formal@none@1@S@<span style="normalText">'''Example'''</span>@@@@1@2@@danf@17-8-2009 10540850@unknown@formal@none@1@S@And then a [[XML]] instance example with metadata:@@@@1@8@@danf@17-8-2009 10540860@unknown@formal@none@1@S@'''John'''@@@@1@2@@danf@17-8-2009 10540870@unknown@formal@none@1@S@Where the inline assertion that a person's middle name may be an empty data item is metadata about the data item.@@@@1@21@@danf@17-8-2009 10540880@unknown@formal@none@1@S@Such definitions however are usually not placed inline in XML.@@@@1@10@@danf@17-8-2009 10540890@unknown@formal@none@1@S@Instead these definitions are moved away into the [[schema]] definition that contains the metadata for the entire document instance.@@@@1@19@@danf@17-8-2009 10540900@unknown@formal@none@1@S@This again illustrates another important aspect of metadata in the context of markup.@@@@1@13@@danf@17-8-2009 10540910@unknown@formal@none@1@S@The metadata is optimally defined only once for a collection of data instances.@@@@1@13@@danf@17-8-2009 10540920@unknown@formal@none@1@S@Hence repeated items of markup are rarely metadata, but rather more markup data instances themselves.@@@@1@15@@danf@17-8-2009 10540930@unknown@formal@none@1@S@=== Hierarchies of metadata ===@@@@1@5@@danf@17-8-2009 10540940@unknown@formal@none@1@S@When structured into a hierarchical arrangement, metadata is more properly called an [[Ontology (computer science)|ontology]] or [[schema]].@@@@1@17@@danf@17-8-2009 10540950@unknown@formal@none@1@S@Both terms describe "what exists" for some purpose or to enable some action.@@@@1@13@@danf@17-8-2009 10540960@unknown@formal@none@1@S@For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects "exist" in the library's own ontology and how more specialized topics are related to or derived from the more general subject headings.@@@@1@57@@danf@17-8-2009 10540970@unknown@formal@none@1@S@Metadata is frequently stored in a central location and used to help organizations standardize their data.@@@@1@16@@danf@17-8-2009 10540980@unknown@formal@none@1@S@This information is typically stored in a [[metadata registry]].@@@@1@9@@danf@17-8-2009 10540990@unknown@formal@none@1@S@=== Difference between data and metadata ===@@@@1@7@@danf@17-8-2009 10541000@unknown@formal@none@1@S@Usually it is not possible to distinguish between (plain) data and metadata because:@@@@1@13@@danf@17-8-2009 10541010@unknown@formal@none@1@S@*Something can be data and metadata at the same time.@@@@1@10@@danf@17-8-2009 10541020@unknown@formal@none@1@S@The headline of an article is both its title (metadata) and part of its text (data).@@@@1@16@@danf@17-8-2009 10541030@unknown@formal@none@1@S@* Data and metadata can change their roles.@@@@1@8@@danf@17-8-2009 10541040@unknown@formal@none@1@S@A poem, as such, would be regarded as data, but if there were a song that used it as lyrics, the whole poem could be attached to an audio file of the song as metadata.@@@@1@35@@danf@17-8-2009 10541050@unknown@formal@none@1@S@Thus, the labeling depends on the point of view.@@@@1@9@@danf@17-8-2009 10541060@unknown@formal@none@1@S@These considerations apply no matter which of the above definitions is considered, except where explicit markup is used to denote what is data and what is metadata.@@@@1@27@@danf@17-8-2009 10541070@unknown@formal@none@1@S@== Use ==@@@@1@3@@danf@17-8-2009 10541080@unknown@formal@none@1@S@Metadata has many different applications; this section lists some of the most common.@@@@1@13@@danf@17-8-2009 10541090@unknown@formal@none@1@S@Metadata is used to speed up and enrich searching for resources.@@@@1@11@@danf@17-8-2009 10541100@unknown@formal@none@1@S@In general, search queries using metadata can save users from performing more complex filter operations manually.@@@@1@16@@danf@17-8-2009 10541110@unknown@formal@none@1@S@It is now common for web browsers (with the notable exception of Mozilla Firefox), P2P applications and media management software to automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched.@@@@1@39@@danf@17-8-2009 10541120@unknown@formal@none@1@S@Metadata may also be associated to files manually.@@@@1@8@@danf@17-8-2009 10541130@unknown@formal@none@1@S@This is often the case with documents which are scanned into a document storage repository such as FileNet or Documentum.@@@@1@20@@danf@17-8-2009 10541140@unknown@formal@none@1@S@Once the documents have been converted into an electronic format a user brings the image up in a viewer application, manually reads the document and keys values into an online application to be stored in a metadata repository.@@@@1@38@@danf@17-8-2009 10541150@unknown@formal@none@1@S@Metadata provide additional information to users of the data it describes.@@@@1@11@@danf@17-8-2009 10541160@unknown@formal@none@1@S@This information may be descriptive ("These pictures were taken by children in the school's third grade class.") or algorithmic ("Checksum=139F").@@@@1@20@@danf@17-8-2009 10541170@unknown@formal@none@1@S@Metadata helps to bridge the [[semantic gap]].@@@@1@7@@danf@17-8-2009 10541180@unknown@formal@none@1@S@By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations.@@@@1@29@@danf@17-8-2009 10541190@unknown@formal@none@1@S@For example, if a search engine understands that "Van Gogh" was a "Dutch painter", it can answer a search query on "Dutch painters" with a link to a web page about Vincent Van Gogh, although the exact words "Dutch painters" never occur on that page.@@@@1@45@@danf@17-8-2009 10541200@unknown@formal@none@1@S@This approach, called knowledge representation, is of special interest to the [[semantic web]] and [[artificial intelligence]].@@@@1@16@@danf@17-8-2009 10541210@unknown@formal@none@1@S@Certain metadata is designed to optimize [[lossy compression]].@@@@1@8@@danf@17-8-2009 10541220@unknown@formal@none@1@S@For example, if a video has metadata that allows a computer to tell foreground from background, the latter can be compressed more aggressively to achieve a higher compression rate.@@@@1@29@@danf@17-8-2009 10541230@unknown@formal@none@1@S@Some metadata is intended to enable variable content presentation.@@@@1@9@@danf@17-8-2009 10541240@unknown@formal@none@1@S@For example, if a picture has metadata that indicates the most important region — the one where there is a person — an image viewer on a small screen, such as on a mobile phone's, can narrow the picture to that region and thus show the user the most interesting details.@@@@1@51@@danf@17-8-2009 10541250@unknown@formal@none@1@S@A similar kind of metadata is intended to allow blind people to access diagrams and pictures, by converting them for special output devices or reading their description using [[speech synthesis|text-to-speech]] software.@@@@1@31@@danf@17-8-2009 10541260@unknown@formal@none@1@S@Other descriptive metadata can be used to automate workflows.@@@@1@9@@danf@17-8-2009 10541270@unknown@formal@none@1@S@For example, if a "smart" software tool knows content and structure of data, it can convert it automatically and pass it to another "smart" tool as input.@@@@1@27@@danf@17-8-2009 10541280@unknown@formal@none@1@S@As a result, users save the many [[cut, copy and paste|copy-and-paste]] operations required when analyzing data with "dumb" tools.@@@@1@19@@danf@17-8-2009 10541290@unknown@formal@none@1@S@Metadata is becoming an increasingly important part of [[electronic discovery]].@@@@1@10@@danf@17-8-2009 10541295@unknown@formal@none@1@S@[http://www.lexbe.com/hp/indepth-e-discovery-rule-metadata.htm] Application and file system metadata derived from [[electronic document]]s and files can be important evidence.@@@@1@16@@danf@17-8-2009 10541300@unknown@formal@none@1@S@Recent changes to the [[Federal Rules of Civil Procedure]] make metadata routinely discoverable as part of [[Civil law (common law)|civil litigation]].@@@@1@21@@danf@17-8-2009 10541310@unknown@formal@none@1@S@Parties to litigation are required to maintain and produce metadata as part of [[discovery (law)|discovery]], and [[spoliation of evidence|spoliation]] of metadata can lead to sanctions.@@@@1@25@@danf@17-8-2009 10541320@unknown@formal@none@1@S@Metadata has become important on the [[World Wide Web]] because of the need to find useful information from the mass of information available.@@@@1@23@@danf@17-8-2009 10541330@unknown@formal@none@1@S@Manually-created metadata adds value because it ensures consistency.@@@@1@8@@danf@17-8-2009 10541340@unknown@formal@none@1@S@If a web page about a certain topic contains a word or phrase, then all web pages about that topic should contain that same word or phrase.@@@@1@27@@danf@17-8-2009 10541350@unknown@formal@none@1@S@Metadata also ensures variety, so that if a topic goes by two names each will be used.@@@@1@17@@danf@17-8-2009 10541360@unknown@formal@none@1@S@For example, an article about "[[sport utility vehicle]]s" would also be [[tag (metadata)|tagged]] "4 wheel drives", "4WDs" and "four wheel drives", as this is how SUVs are known in some countries.@@@@1@31@@danf@17-8-2009 10541370@unknown@formal@none@1@S@Examples of metadata for an [[Compact Disc|audio CD]] include the [[MusicBrainz]] project and [[All Media Guide]]'s [[Allmusic]].@@@@1@17@@danf@17-8-2009 10541380@unknown@formal@none@1@S@Similarly, [[MP3]] files have metadata tags in a format called [[ID3]].@@@@1@11@@danf@17-8-2009 10541390@unknown@formal@none@1@S@== Types of metadata ==@@@@1@5@@danf@17-8-2009 10541400@unknown@formal@none@1@S@Metadata can be classified by:@@@@1@5@@danf@17-8-2009 10541410@unknown@formal@none@1@S@* Content.@@@@1@2@@danf@17-8-2009 10541420@unknown@formal@none@1@S@Metadata can either describe the ''resource'' itself (for example, name and size of a file) or the ''content'' of the resource (for example, "This video shows a boy playing football").@@@@1@30@@danf@17-8-2009 10541430@unknown@formal@none@1@S@* Mutability.@@@@1@2@@danf@17-8-2009 10541440@unknown@formal@none@1@S@With respect to the whole resource, metadata can be either ''immutable'' (for example, the "Title" of a video does not change as the video itself is being played) or ''mutable'' (the "Scene description" does change).@@@@1@35@@danf@17-8-2009 10541450@unknown@formal@none@1@S@* Logical function.@@@@1@3@@danf@17-8-2009 10541460@unknown@formal@none@1@S@There are three layers of logical function: at the bottom the ''subsymbolic'' layer that contains the raw data itself, then the ''symbolic'' layer with metadata describing the raw data, and on the top the ''logical'' layer containing metadata that allows logical reasoning using the symbolic layer@@@@1@46@@danf@17-8-2009 10541470@unknown@formal@none@1@S@== Important issues ==@@@@1@4@@danf@17-8-2009 10541480@unknown@formal@none@1@S@To successfully develop and use metadata, several important issues should be treated with care:@@@@1@14@@danf@17-8-2009 10541490@unknown@formal@none@1@S@=== Metadata risks ===@@@@1@4@@danf@17-8-2009 10541500@unknown@formal@none@1@S@[[Microsoft Office]] files include metadata beyond their printable content, such as the original author's name, the creation date of the document, and the amount of time spent editing it.@@@@1@29@@danf@17-8-2009 10541510@unknown@formal@none@1@S@Unintentional disclosure can be awkward or even, in professional practices requiring confidentiality, raise malpractice concerns.@@@@1@15@@danf@17-8-2009 10541520@unknown@formal@none@1@S@Some of Microsoft Office document's metadata can be seen by clicking ''File'' then ''Properties'' from the program's menu.@@@@1@18@@danf@17-8-2009 10541530@unknown@formal@none@1@S@Other metadata is not visible except through external analysis of a file, such as is done in forensics.@@@@1@18@@danf@17-8-2009 10541540@unknown@formal@none@1@S@The author of the Microsoft Word-based [[Melissa (computer worm)|Melissa]] computer virus in 1999 was caught due to Word metadata that uniquely identified the computer used to create the original infected document.@@@@1@31@@danf@17-8-2009 10541550@unknown@formal@none@1@S@=== Metadata lifecycle ===@@@@1@4@@danf@17-8-2009 10541560@unknown@formal@none@1@S@Even in the early phases of planning and designing it is necessary to keep track of all metadata created.@@@@1@19@@danf@17-8-2009 10541570@unknown@formal@none@1@S@It is not economical to start attaching metadata only after the production process has been completed.@@@@1@16@@danf@17-8-2009 10541580@unknown@formal@none@1@S@For example, if metadata created by a digital camera at recording time is not stored immediately, it may have to be restored afterwards manually with great effort.@@@@1@27@@danf@17-8-2009 10541590@unknown@formal@none@1@S@Therefore, it is necessary for different groups of resource producers to cooperate using compatible methods and standards.@@@@1@17@@danf@17-8-2009 10541600@unknown@formal@none@1@S@* Manipulation.@@@@1@2@@danf@17-8-2009 10541610@unknown@formal@none@1@S@Metadata must adapt if the resource it describes changes.@@@@1@9@@danf@17-8-2009 10541620@unknown@formal@none@1@S@It should be merged when two resources are merged.@@@@1@9@@danf@17-8-2009 10541630@unknown@formal@none@1@S@These operations are seldom performed by today's software; for example, image editing programs usually do not keep track of the [[Exchangeable image file format|Exif]] metadata created by digital cameras.@@@@1@29@@danf@17-8-2009 10541640@unknown@formal@none@1@S@* Destruction.@@@@1@2@@danf@17-8-2009 10541650@unknown@formal@none@1@S@It can be useful to keep metadata even after the resource it describes has been destroyed, for example in change histories within a text document or to archive file deletions due to digital rights management.@@@@1@35@@danf@17-8-2009 10541660@unknown@formal@none@1@S@None of today's metadata standards consider this phase.@@@@1@8@@danf@17-8-2009 10541670@unknown@formal@none@1@S@=== Storage ===@@@@1@3@@danf@17-8-2009 10541680@unknown@formal@none@1@S@Metadata can be stored either ''internally'', in the same file as the data, or ''externally'', in a separate file.@@@@1@19@@danf@17-8-2009 10541690@unknown@formal@none@1@S@Metadata that are embedded with content is called ''embedded metadata''.@@@@1@10@@danf@17-8-2009 10541700@unknown@formal@none@1@S@A data repository typically stores the metadata ''detached'' from the data.@@@@1@11@@danf@17-8-2009 10541710@unknown@formal@none@1@S@Both ways have advantages and disadvantages:@@@@1@6@@danf@17-8-2009 10541720@unknown@formal@none@1@S@*Internal storage allows transferring metadata together with the data it describes; thus, metadata is always at hand and can be manipulated easily.@@@@1@22@@danf@17-8-2009 10541730@unknown@formal@none@1@S@This method creates high redundancy and does not allow holding metadata together.@@@@1@12@@danf@17-8-2009 10541740@unknown@formal@none@1@S@* External storage allows bundling metadata, for example in a database, for more efficient searching.@@@@1@15@@danf@17-8-2009 10541750@unknown@formal@none@1@S@There is no redundancy and metadata can be transferred simultaneously when using [[streaming media|streaming]].@@@@1@14@@danf@17-8-2009 10541760@unknown@formal@none@1@S@However, as most formats use [[Uniform Resource Identifier|URI]]s for that purpose, the method of how the metadata is linked to its data should be treated with care.@@@@1@27@@danf@17-8-2009 10541770@unknown@formal@none@1@S@What if a resource does not have a URI (resources on a local hard disk or web pages that are created on-the-fly using a content management system)?@@@@1@27@@danf@17-8-2009 10541780@unknown@formal@none@1@S@What if metadata can only be evaluated if there is a connection to the Web, especially when using [[Resource Description Framework|RDF]]?@@@@1@21@@danf@17-8-2009 10541790@unknown@formal@none@1@S@How to realize that a resource is replaced by another with the same name but different content?@@@@1@17@@danf@17-8-2009 10541800@unknown@formal@none@1@S@Moreover, there is the question of data format: storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools.@@@@1@30@@danf@17-8-2009 10541810@unknown@formal@none@1@S@On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory.@@@@1@32@@danf@17-8-2009 10541820@unknown@formal@none@1@S@== Criticisms ==@@@@1@3@@danf@17-8-2009 10541830@unknown@formal@none@1@S@Although the majority of computer scientists see metadata as a chance for better interoperability, some critics argue:@@@@1@17@@danf@17-8-2009 10541840@unknown@formal@none@1@S@*Metadata is too expensive and time-consuming.@@@@1@6@@danf@17-8-2009 10541850@unknown@formal@none@1@S@The argument is that companies will not produce metadata without need because it costs extra money, and private users also will not produce complex metadata because its creation is very time-consuming.@@@@1@31@@danf@17-8-2009 10541860@unknown@formal@none@1@S@* Metadata is too complicated.@@@@1@5@@danf@17-8-2009 10541870@unknown@formal@none@1@S@Private users will not create metadata because existing formats, especially [[MPEG-7]], are too complicated.@@@@1@14@@danf@17-8-2009 10541880@unknown@formal@none@1@S@As long as there are no automatic tools for creating metadata, it will not be created.@@@@1@16@@danf@17-8-2009 10541890@unknown@formal@none@1@S@* Metadata is subjective and depends on context.@@@@1@8@@danf@17-8-2009 10541900@unknown@formal@none@1@S@Most probably, two persons will attach different metadata to the same resource due to their different points of view.@@@@1@19@@danf@17-8-2009 10541910@unknown@formal@none@1@S@Moreover, metadata can be misinterpreted due to its dependency on context.@@@@1@11@@danf@17-8-2009 10541920@unknown@formal@none@1@S@For example searching for "post-modern art" may miss a certain item because the expression was not in use at the time when that work of art was created, or searching for "pictures taken at 1:00" may produce confusing results due to local time differences.@@@@1@44@@danf@17-8-2009 10541930@unknown@formal@none@1@S@* There is no end to metadata.@@@@1@7@@danf@17-8-2009 10541940@unknown@formal@none@1@S@For example, when annotating a match of soccer with metadata, one can describe all the players and their actions in time and stop there.@@@@1@24@@danf@17-8-2009 10541950@unknown@formal@none@1@S@One can also describe the advertisements in the background and the clothes the players wear.@@@@1@15@@danf@17-8-2009 10541960@unknown@formal@none@1@S@One can also describe each fan on the tribune and the clothes they wear.@@@@1@14@@danf@17-8-2009 10541970@unknown@formal@none@1@S@All of this metadata can be interesting to one party or another — such as the spectators, sponsors or a counter-terrorist unit of the police — and even for a simple resource the amount of possible metadata can be gigantic.@@@@1@40@@danf@17-8-2009 10541980@unknown@formal@none@1@S@* Metadata is useless.@@@@1@4@@danf@17-8-2009 10541990@unknown@formal@none@1@S@Many of today's search engines are very efficient at finding text.@@@@1@11@@danf@17-8-2009 10542000@unknown@formal@none@1@S@Other techniques for finding pictures, videos and music (namely query-by-example) will become more and more powerful in the future.@@@@1@19@@danf@17-8-2009 10542010@unknown@formal@none@1@S@Thus, there is no real need for metadata.@@@@1@8@@danf@17-8-2009 10542020@unknown@formal@none@1@S@The opposers of metadata sometimes use the term [[metacrap]] to refer to the unsolved problems of metadata in some scenarios.@@@@1@20@@danf@17-8-2009 10542030@unknown@formal@none@1@S@These people are also referred to as "Meta Haters."@@@@1@9@@danf@17-8-2009 10542040@unknown@formal@none@1@S@== Types ==@@@@1@3@@danf@17-8-2009 10542050@unknown@formal@none@1@S@In general, there are two distinct classes of metadata: structural or control metadata and guide metadata.@@@@1@16@@danf@17-8-2009 10542060@unknown@formal@none@1@S@Structural metadata is used to describe the structure of computer systems such as tables, columns and indexes.@@@@1@17@@danf@17-8-2009 10542070@unknown@formal@none@1@S@Guide metadata is used to help humans find specific items and is usually expressed as a set of keywords in a natural language.@@@@1@23@@danf@17-8-2009 10542080@unknown@formal@none@1@S@Metatadata can be divided into 3 distinct categories:@@@@1@8@@danf@17-8-2009 10542090@unknown@formal@none@1@S@* Descriptive@@@@1@2@@danf@17-8-2009 10542100@unknown@formal@none@1@S@* Administrative@@@@1@2@@danf@17-8-2009 10542110@unknown@formal@none@1@S@* Structural@@@@1@2@@danf@17-8-2009 10542120@unknown@formal@none@1@S@=== Relational database metadata ===@@@@1@5@@danf@17-8-2009 10542130@unknown@formal@none@1@S@Each [[relational database]] system has its own mechanisms for storing metadata.@@@@1@11@@danf@17-8-2009 10542140@unknown@formal@none@1@S@Examples of relational-database metadata include:@@@@1@5@@danf@17-8-2009 10542150@unknown@formal@none@1@S@*Tables of all tables in database, their names, sizes and number of rows in each table.@@@@1@16@@danf@17-8-2009 10542160@unknown@formal@none@1@S@* Tables of columns in each database, what tables they are used in, and the type of data stored in each column.@@@@1@22@@danf@17-8-2009 10542170@unknown@formal@none@1@S@In database terminology, this set of metadata is referred to as the [[database catalog|catalog]].@@@@1@14@@danf@17-8-2009 10542180@unknown@formal@none@1@S@The [[SQL]] standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard.@@@@1@30@@danf@17-8-2009 10542190@unknown@formal@none@1@S@For an example of database-specific metadata access methods, see [[Oracle metadata]].@@@@1@11@@danf@17-8-2009 10542200@unknown@formal@none@1@S@=== Data warehouse metadata ===@@@@1@5@@danf@17-8-2009 10542210@unknown@formal@none@1@S@[[Data warehouse]] metadata systems are sometimes separated into two sections:@@@@1@10@@danf@17-8-2009 10542220@unknown@formal@none@1@S@# '''back room''' metadata that are used for [[Extract, transform, load]] functions to get [[OLTP]] data into a data warehouse@@@@1@20@@danf@17-8-2009 10542230@unknown@formal@none@1@S@# '''front room''' metadata that are used to label screens and create reports@@@@1@13@@danf@17-8-2009 10542240@unknown@formal@none@1@S@Kimball lists the following types of metadata in a data warehouse (See also [http://www.fortunecity.com/skyscraper/oracle/699/orahtml/dbmsmag/9803d05.html]):@@@@1@14@@danf@17-8-2009 10542250@unknown@formal@none@1@S@* [[source system]] metadata@@@@1@4@@danf@17-8-2009 10542260@unknown@formal@none@1@S@** source specifications, such as [[repository|repositories]], and source [[logical schema]]s@@@@1@10@@danf@17-8-2009 10542270@unknown@formal@none@1@S@** source descriptive information, such as ownership descriptions, update frequencies, legal limitations, and [[access method]]s@@@@1@15@@danf@17-8-2009 10542280@unknown@formal@none@1@S@** process information, such as job schedules and extraction code@@@@1@10@@danf@17-8-2009 10542290@unknown@formal@none@1@S@* [[data staging]] metadata@@@@1@4@@danf@17-8-2009 10542300@unknown@formal@none@1@S@** [[data acquisition]] information, such as [[data transmission]] scheduling and results, and file usage@@@@1@14@@danf@17-8-2009 10542310@unknown@formal@none@1@S@** [[dimension table]] management, such as definitions of dimensions, and [[surrogate key]] assignments@@@@1@13@@danf@17-8-2009 10542320@unknown@formal@none@1@S@** [[Program transformation|transformation]] and [[aggregation]], such as [[data enhancement]] and mapping, [[DBMS]] load scripts, and aggregate definitions@@@@1@17@@danf@17-8-2009 10542330@unknown@formal@none@1@S@** audit, job logs and documentation, such as [[data lineage]] records, [[data transform]] logs@@@@1@14@@danf@17-8-2009 10542340@unknown@formal@none@1@S@* DBMS metadata, such as:@@@@1@5@@danf@17-8-2009 10542350@unknown@formal@none@1@S@** DBMS system table contents@@@@1@5@@danf@17-8-2009 10542360@unknown@formal@none@1@S@** processing hints@@@@1@3@@danf@17-8-2009 10542370@unknown@formal@none@1@S@Michael Bracket defines metadata (what he calls "Data resource data") as "any data about the organization's data resource".@@@@1@18@@danf@17-8-2009 10542380@unknown@formal@none@1@S@Adrienne Tannenbaum defines metadata as "the detailed description of instance data.@@@@1@11@@danf@17-8-2009 10542390@unknown@formal@none@1@S@The format and characteristics of populated instance data: instances and values, dependent on the role of the metadata recipient".@@@@1@19@@danf@17-8-2009 10542400@unknown@formal@none@1@S@These definitions are characteristic of the "data about data" definition.@@@@1@10@@danf@17-8-2009 10542410@unknown@formal@none@1@S@=== Business Intelligence metadata ===@@@@1@5@@danf@17-8-2009 10542420@unknown@formal@none@1@S@[[Business Intelligence]] is the process of analyzing large amounts of corporate data, usually stored in large databases such as the [[Data Warehouse]], tracking business performance, detecting patterns and trends, and helping enterprise business users make better decisions.@@@@1@37@@danf@17-8-2009 10542430@unknown@formal@none@1@S@Business Intelligence metadata describes how data is queried, filtered, analyzed, and displayed in Business Intelligence software tools, such as Reporting tools, OLAP tools, Data Mining tools.@@@@1@26@@danf@17-8-2009 10542440@unknown@formal@none@1@S@Examples:@@@@1@1@@danf@17-8-2009 10542450@unknown@formal@none@1@S@* [[Online analytical processing|OLAP]] metadata: The descriptions and structures of Dimensions, Cubes, Measures (Metrics), Hierarchies, Levels, Drill Paths@@@@1@18@@danf@17-8-2009 10542460@unknown@formal@none@1@S@* Reporting metadata: The descriptions and structures of Reports, Charts, Queries, DataSets, Filters, Variables, Expressions@@@@1@15@@danf@17-8-2009 10542470@unknown@formal@none@1@S@* [[Data Mining]] metadata: The descriptions and structures of DataSets, Algorithms, Queries@@@@1@12@@danf@17-8-2009 10542480@unknown@formal@none@1@S@Business Intelligence metadata can be used to understand how corporate financial reports reported to [[Wall Street]] are calculated, how the revenue, expense and profit are aggregated from individual sales transactions stored in the data warehouse.@@@@1@35@@danf@17-8-2009 10542490@unknown@formal@none@1@S@A good understanding of Business Intelligence metadata is required to solve complex problems such as compliance with corporate governance standards, such as [[Sarbanes Oxley]] (SOX) or Basel II.@@@@1@28@@danf@17-8-2009 10542500@unknown@formal@none@1@S@=== General IT metadata ===@@@@1@5@@danf@17-8-2009 10542510@unknown@formal@none@1@S@In contrast, David Marco, another metadata theorist, defines metadata as "all physical data and knowledge from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation."@@@@1@46@@danf@17-8-2009 10542520@unknown@formal@none@1@S@Others have included web services, systems and interfaces.@@@@1@8@@danf@17-8-2009 10542530@unknown@formal@none@1@S@In fact, the entire [[Zachman framework]] (see [[Enterprise Architecture]]) can be represented as metadata.@@@@1@14@@danf@17-8-2009 10542540@unknown@formal@none@1@S@Notice that such definitions expand metadata's scope considerably, to encompass most or all of the data required by the [[Management Information System]]s capability.@@@@1@23@@danf@17-8-2009 10542550@unknown@formal@none@1@S@In this sense, the concept of metadata has significant overlaps with the [[ITIL]] concept of a Configuration Management Database ([[CMDB]]), and also with disciplines such as [[Enterprise Architecture]] and [[IT portfolio management]].@@@@1@32@@danf@17-8-2009 10542560@unknown@formal@none@1@S@This broader definition of metadata has precedent.@@@@1@7@@danf@17-8-2009 10542570@unknown@formal@none@1@S@Third generation corporate repository products (such as those eventually merged into the CA Advantage line) not only store information about data definitions (COBOL copybooks, DBMS schema), but also about the programs accessing those data structures, and the [[Job Control Language]] and batch job infrastructure dependencies as well.@@@@1@47@@danf@17-8-2009 10542580@unknown@formal@none@1@S@These products (some of which are still in production) can provide a very complete picture of a mainframe computing environment, supporting exactly the kinds of impact analysis required for ITIL-based processes such as [[ITIL#Incident Management|Incident]] and [[Change Management (ITIL)|Change Management]].@@@@1@40@@danf@17-8-2009 10542590@unknown@formal@none@1@S@The [[ITIL]] [http://www.tso.co.uk/itil/ Back Catalogue] includes the ''Data Management'' volume which recognizes the role of these metadata products on the mainframe, posing the [[CMDB]] as the distributed computing equivalent.@@@@1@29@@danf@17-8-2009 10542600@unknown@formal@none@1@S@CMDB vendors however have generally not expanded their scope to include data definitions, and metadata solutions are also available in the distributed world.@@@@1@23@@danf@17-8-2009 10542610@unknown@formal@none@1@S@Determining the appropriate role and scope for each is thus a challenge for large IT organizations requiring the services of both.@@@@1@21@@danf@17-8-2009 10542620@unknown@formal@none@1@S@Since metadata is pervasive, centralized attempts at tracking it need to focus on the most highly leveraged assets.@@@@1@18@@danf@17-8-2009 10542630@unknown@formal@none@1@S@Enterprise Assets may only constitute a small percentage of the entire IT portfolio.@@@@1@13@@danf@17-8-2009 10542640@unknown@formal@none@1@S@Some practitioners have successfully managed IT metadata using the [[Dublin Core]] metamodel.@@@@1@12@@danf@17-8-2009 10542650@unknown@formal@none@1@S@==== IT metadata management products ====@@@@1@6@@danf@17-8-2009 10542660@unknown@formal@none@1@S@First generation data dictionary/metadata repository tools would be those only supporting a specific [[DBMS]], such as [[IDMS]]'s IDD (integrated data dictionary), the [[Information Management System|IMS]] Data Dictionary, and [[ADABAS]]'s Predict.@@@@1@30@@danf@17-8-2009 10542670@unknown@formal@none@1@S@Second generation would be ASG's DATAMANAGER product which could support many different file and DBMS types.@@@@1@16@@danf@17-8-2009 10542680@unknown@formal@none@1@S@Third generation repository products became briefly popular in the early 1990s along with the rise of widespread use of [[RDBMS]] engines such as IBM's [[IBM DB2|DB2]].@@@@1@26@@danf@17-8-2009 10542690@unknown@formal@none@1@S@Fourth generation products link the repository with more [[Extract, transform, load]] tools and can be connected with architectural modeling tools.@@@@1@20@@danf@17-8-2009 10542700@unknown@formal@none@1@S@Examples include [http://www.adaptive.com/products/mm.html Adaptive Metadata Manager] from Adaptive, [http://www.asg.com/products/product_details.asp?code=ROC&id=50 Rochade] from ASG,[http://www.infolibcorp.com/productsOverview.html InfoLibrarian Metadata Integration Framework] and [[Troux Technologies]] Metis Server product.@@@@1@22@@danf@17-8-2009 10542710@unknown@formal@none@1@S@=== File system metadata ===@@@@1@5@@danf@17-8-2009 10542720@unknown@formal@none@1@S@Nearly all [[file system]]s keep metadata about files [[out-of-band]].@@@@1@9@@danf@17-8-2009 10542730@unknown@formal@none@1@S@Some systems keep metadata in [[directory (file systems)|directory]] entries; others in specialized structure like [[inode]]s or even in the name of a file.@@@@1@23@@danf@17-8-2009 10542740@unknown@formal@none@1@S@Metadata can range from simple [[timestamp]]s, [[mode bit]]s, and other special-purpose information used by the implementation itself, to [[icon (computing)|icon]]s and free-text comments, to arbitrary [[attribute-value pair]]s.@@@@1@27@@danf@17-8-2009 10542750@unknown@formal@none@1@S@With more complex and open-ended metadata, it becomes useful to search for files based on the metadata contents.@@@@1@18@@danf@17-8-2009 10542760@unknown@formal@none@1@S@The [[Unix]] [[find]] utility was an early example, although inefficient when scanning hundreds of thousands of files on a modern computer system.@@@@1@22@@danf@17-8-2009 10542770@unknown@formal@none@1@S@[[Apple Computer]]'s [[Mac OS X]] operating system supports cataloguing and searching for file metadata through a feature known as [[Spotlight (software)|Spotlight]], as of [[Mac OS X v10.4|version 10.4]].@@@@1@28@@danf@17-8-2009 10542780@unknown@formal@none@1@S@[[Microsoft]] worked in the development of similar functionality with the [[Instant Search]] system in [[Windows Vista]], as well as being present in [[SharePoint Server]].@@@@1@24@@danf@17-8-2009 10542790@unknown@formal@none@1@S@[[Linux]] implements file metadata using [[extended file attributes]].@@@@1@8@@danf@17-8-2009 10542800@unknown@formal@none@1@S@=== Image metadata ===@@@@1@4@@danf@17-8-2009 10542810@unknown@formal@none@1@S@Examples of image files containing metadata include [[Exchangeable image file format]] (EXIF) and [[Tagged Image File Format]] (TIFF).@@@@1@18@@danf@17-8-2009 10542820@unknown@formal@none@1@S@Having metadata about images embedded in TIFF or EXIF files is one way of acquiring additional data about an image.@@@@1@20@@danf@17-8-2009 10542830@unknown@formal@none@1@S@[[Tag (metadata)|Tagging]] pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections.@@@@1@26@@danf@17-8-2009 10542840@unknown@formal@none@1@S@A prime example of an image tagging service is [[Flickr]], where users upload images and then describe the contents.@@@@1@19@@danf@17-8-2009 10542850@unknown@formal@none@1@S@Other patrons of the site can then search for those tags.@@@@1@11@@danf@17-8-2009 10542860@unknown@formal@none@1@S@Flickr uses a [[folksonomy]]: a free-text keyword system in which the community defines the vocabulary through use rather than through a [[controlled vocabulary]].@@@@1@23@@danf@17-8-2009 10542870@unknown@formal@none@1@S@Users can also tag photos for organization purposes using Adobe's [[Extensible Metadata Platform]] (XMP) language, for example.@@@@1@17@@danf@17-8-2009 10542880@unknown@formal@none@1@S@Digital photography is increasingly making use of technical metadata tags describing the conditions of exposure.@@@@1@15@@danf@17-8-2009 10542890@unknown@formal@none@1@S@Photographers shooting [[RAW image format|Camera RAW]] file formats can use applications such as [[Adobe Bridge]] or Apple Computer's [[Aperture (photography software)|Aperture]] to work with camera metadata for post-processing.@@@@1@28@@danf@17-8-2009 10542900@unknown@formal@none@1@S@=== Audio Metadata ===@@@@1@4@@danf@17-8-2009 10542910@unknown@formal@none@1@S@Audio metadata generally relates to the how the data should be written in order for a processor to efficiently process it.@@@@1@21@@danf@17-8-2009 10542920@unknown@formal@none@1@S@These technologies are usually seen in Audio Engine Programming such as Microsoft [[Resource Interchange File Format|RIFF (Resource Interchange File Format)]] technologies for .wave file.@@@@1@24@@danf@17-8-2009 10542930@unknown@formal@none@1@S@Codes generally develop their own metadata standards for compression purpose.@@@@1@10@@danf@17-8-2009 10542940@unknown@formal@none@1@S@=== Program metadata ===@@@@1@4@@danf@17-8-2009 10542950@unknown@formal@none@1@S@Metadata is casually used to describe the controlling data used in software architectures that are more abstract or configurable.@@@@1@19@@danf@17-8-2009 10542960@unknown@formal@none@1@S@Most '''[[executable|executable file]]''' formats include what may be termed "metadata" that specifies certain, usually configurable, behavioral [[runtime]] characteristics.@@@@1@18@@danf@17-8-2009 10542970@unknown@formal@none@1@S@However, it is difficult if not impossible to precisely distinguish program "metadata" from general aspects of [[Von Neumann architecture|stored-program computing architecture]]; if the machine reads it and acts upon it, it is a computational [[Instruction (computer science)|instruction]], and the prefix "meta" has little significance.@@@@1@44@@danf@17-8-2009 10542980@unknown@formal@none@1@S@In [[Java (programming language)|Java]], the [[Class (file format)|class file format]] contains metadata used by the [[Java compiler]] and the [[Java virtual machine]] to [[dynamic linking|dynamically link]] [[class (computer science)|classes]] and to support [[reflection (computer science)|reflection]].@@@@1@35@@danf@17-8-2009 10542990@unknown@formal@none@1@S@The [[J2SE]] 5.0 version of Java included a [[metadata facility for Java|metadata facility]] to allow additional annotations that are used by [[development tool]]s.@@@@1@23@@danf@17-8-2009 10543000@unknown@formal@none@1@S@In [[MS-DOS]], the [[COM file]] format does ''not'' include metadata, while the [[EXE]] file and Windows [[Portable Executable|PE]] formats do.@@@@1@20@@danf@17-8-2009 10543010@unknown@formal@none@1@S@These metadata can include the company that published the program, the date the program was created, the version number and more.@@@@1@21@@danf@17-8-2009 10543020@unknown@formal@none@1@S@In the [[.NET Framework|Microsoft .NET]] executable format, extra metadata is included to allow [[Reflection (computer science)|reflection]] at runtime.@@@@1@18@@danf@17-8-2009 10543030@unknown@formal@none@1@S@=== Existing software metadata ===@@@@1@5@@danf@17-8-2009 10543040@unknown@formal@none@1@S@[[Object Management Group]] (OMG) has defined metadata format for representing entire existing applications for the purposes of [[software mining]], [[software modernization]] and software assurance.@@@@1@24@@danf@17-8-2009 10543050@unknown@formal@none@1@S@This specification, called the OMG [[Knowledge Discovery Metamodel]] (KDM) is the OMG's foundation for "modeling in reverse".@@@@1@17@@danf@17-8-2009 10543060@unknown@formal@none@1@S@KDM is a common language-independent intermediate representation that provides an integrated view of an entire enterprise application, including its behavior (program flow), data, and structure.@@@@1@25@@danf@17-8-2009 10543070@unknown@formal@none@1@S@One of the applications of KDM is Business Rules Mining.@@@@1@10@@danf@17-8-2009 10543080@unknown@formal@none@1@S@[[Knowledge Discovery Metamodel]] includes a fine grained low-level representation (called "micro KDM"), suitable for performing static analysis of programs.@@@@1@19@@danf@17-8-2009 10543090@unknown@formal@none@1@S@=== Document metadata ===@@@@1@4@@danf@17-8-2009 10543100@unknown@formal@none@1@S@Most programs that create documents, including Microsoft [[SharePoint]], [[Microsoft Office Word|Microsoft Word]] and other [[Microsoft Office]] products, save metadata with the document files.@@@@1@23@@danf@17-8-2009 10543110@unknown@formal@none@1@S@These metadata can contain the name of the person who created the file (obtained from the operating system), the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file.@@@@1@47@@danf@17-8-2009 10543120@unknown@formal@none@1@S@Other saved material, such as deleted text (saved in case of an undelete command), document comments and the like, is also commonly referred to as "metadata", and the inadvertent inclusion of this material in distributed files has sometimes led to undesirable disclosures.@@@@1@42@@danf@17-8-2009 10543130@unknown@formal@none@1@S@Document Metadata is particularly important in legal environments where litigation can request this sensitive information (metadata) which can include many elements of private detrimental data.@@@@1@25@@danf@17-8-2009 10543140@unknown@formal@none@1@S@This data has been linked to multiple lawsuits that have got corporations into legal complications.@@@@1@15@@danf@17-8-2009 10543150@unknown@formal@none@1@S@Many legal firms today use "Metadata Management Software", also known as "Metadata Removal Tools".@@@@1@14@@danf@17-8-2009 10543160@unknown@formal@none@1@S@This software can be used to clean documents before they are sent outside of their firm.@@@@1@16@@danf@17-8-2009 10543170@unknown@formal@none@1@S@This process, known as metadata management, protects lawfirms from potentially unsafe leaking of sensitive data through [[Electronic Discovery]].@@@@1@18@@danf@17-8-2009 10543180@unknown@formal@none@1@S@For a list of executable formats, see [[object file]].@@@@1@9@@danf@17-8-2009 10543190@unknown@formal@none@1@S@=== Metamodels ===@@@@1@3@@danf@17-8-2009 10543200@unknown@formal@none@1@S@Metadata on Models are called [[Metamodel]]s.@@@@1@6@@danf@17-8-2009 10543210@unknown@formal@none@1@S@In [[Model Driven Engineering]], a [[Model (abstract)|Model]] has to conform to a given [[Metamodel]].@@@@1@14@@danf@17-8-2009 10543220@unknown@formal@none@1@S@According to the [[model-driven architecture|MDA]] guide, a metamodel is a model and each model conforms to a given metamodel.@@@@1@19@@danf@17-8-2009 10543230@unknown@formal@none@1@S@[[Meta-modeling]] allows strict and agile automatic processing of models and metamodels.@@@@1@11@@danf@17-8-2009 10543240@unknown@formal@none@1@S@The [[Object Management Group]] (OMG) defines 4 layers of meta-modeling.@@@@1@10@@danf@17-8-2009 10543250@unknown@formal@none@1@S@Each level of modeling is defined, validated by the next layer:@@@@1@11@@danf@17-8-2009 10543260@unknown@formal@none@1@S@*M0: instance object, data row, record -> "John Smith"@@@@1@9@@danf@17-8-2009 10543270@unknown@formal@none@1@S@* M1: model, schema -> "Customer" UML Class or database Table@@@@1@11@@danf@17-8-2009 10543280@unknown@formal@none@1@S@* M2: metamodel -> [[Unified Modeling Language]] (UML), [[Common Warehouse Metamodel]] (CWM), [[Knowledge Discovery Metamodel]] (KDM)@@@@1@16@@danf@17-8-2009 10543290@unknown@formal@none@1@S@* M3: meta-metamodel -> [[Meta-Object Facility]] (MOF)@@@@1@7@@danf@17-8-2009 10543300@unknown@formal@none@1@S@=== Meta-metadata ===@@@@1@3@@danf@17-8-2009 10543310@unknown@formal@none@1@S@Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata."@@@@1@13@@danf@17-8-2009 10543320@unknown@formal@none@1@S@Machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.@@@@1@19@@danf@17-8-2009 10543330@unknown@formal@none@1@S@=== Digital library metadata ===@@@@1@5@@danf@17-8-2009 10543340@unknown@formal@none@1@S@There are three categories of metadata that are frequently used to describe objects in a digital library:@@@@1@17@@danf@17-8-2009 10543350@unknown@formal@none@1@S@# '''descriptive''' - Information describing the intellectual content of the object, such as [[MARC]] cataloguing records, finding aids or similar schemes.@@@@1@21@@danf@17-8-2009 10543360@unknown@formal@none@1@S@It is typically used for bibliographic purposes and for search and retrieval.@@@@1@12@@danf@17-8-2009 10543370@unknown@formal@none@1@S@# '''structural''' - Information that ties each object to others to make up logical units (e.g., information that relates individual images of pages from a book to the others that make up the book).@@@@1@34@@danf@17-8-2009 10543380@unknown@formal@none@1@S@# '''administrative''' - Information used to manage the object or control access to it.@@@@1@14@@danf@17-8-2009 10543390@unknown@formal@none@1@S@This may include information on how it was scanned, its storage format, [[copyright]] and licensing information, and information necessary for the [[digital preservation|long-term preservation]] of the digital objects.@@@@1@28@@danf@17-8-2009 10543400@unknown@formal@none@1@S@=== Geospatial metadata ===@@@@1@4@@danf@17-8-2009 10543410@unknown@formal@none@1@S@Metadata that describe geographic objects (such as datasets, maps, features, or simply documents with a geospatial component) have a history going back to at least 1994 (refer [http://libraries.mit.edu/guides/subjects/metadata/standards/fgdc.html MIT Library page on FGDC Metadata]).@@@@1@34@@danf@17-8-2009 10543420@unknown@formal@none@1@S@This class of metadata is described more fully on the [[Geospatial metadata]] page.@@@@1@13@@danf@17-8-2009 10550010@unknown@formal@none@1@S@
Microsoft Windows
@@@@1@2@@danf@17-8-2009 10550020@unknown@formal@none@1@S@'''Microsoft Windows''' is a series of [[software]] [[operating system]]s produced by [[Microsoft]].@@@@1@12@@danf@17-8-2009 10550030@unknown@formal@none@1@S@Microsoft first introduced an operating environment named ''Windows'' in November 1985 as an add-on to [[MS-DOS]] in response to the growing interest in [[graphical user interface]]s (GUIs).@@@@1@27@@danf@17-8-2009 10550040@unknown@formal@none@1@S@Microsoft Windows came to [[Market dominance|dominate]] the world's [[personal computer]] market, overtaking [[Mac OS]], which had been introduced previously.@@@@1@19@@danf@17-8-2009 10550050@unknown@formal@none@1@S@At the 2004 [[International Data Corporation|IDC]] Directions conference, it was stated that Windows had approximately 90% of the [[Client (computing)|client]] operating system market.@@@@1@23@@danf@17-8-2009 10550060@unknown@formal@none@1@S@The most recent client version of Windows is [[Windows Vista]]; the current [[Server (computing)|server]] version is [[Windows Server 2008]].@@@@1@19@@danf@17-8-2009 10550070@unknown@formal@none@1@S@==Versions==@@@@1@1@@danf@17-8-2009 10550080@unknown@formal@none@1@S@The term ''Windows'' collectively describes any or all of several generations of Microsoft (MS) operating system (OS) products.@@@@1@18@@danf@17-8-2009 10550090@unknown@formal@none@1@S@These products are generally categorized as follows:@@@@1@7@@danf@17-8-2009 10550100@unknown@formal@none@1@S@===16-bit operating environments===@@@@1@3@@danf@17-8-2009 10550110@unknown@formal@none@1@S@The early versions of Windows were often thought of as just graphical user interfaces, mostly because they ran on top of [[MS-DOS]] and used it for [[file system]] services.@@@@1@29@@danf@17-8-2009 10550120@unknown@formal@none@1@S@However, even the earliest 16-bit Windows versions already assumed many typical operating system functions, notably, having their own [[executable file format]] and providing their own [[device driver]]s (timer, graphics, printer, mouse, keyboard and sound) for applications.@@@@1@36@@danf@17-8-2009 10550130@unknown@formal@none@1@S@Unlike [[MS-DOS]], Windows allowed users to execute multiple graphical applications at the same time, through [[computer multitasking|cooperative multitasking]].@@@@1@18@@danf@17-8-2009 10550140@unknown@formal@none@1@S@Finally, Windows implemented an elaborate, segment-based, software virtual memory scheme, which allowed it to run applications larger than available memory: code segments and [[resource (Windows)|resource]]s were swapped in and thrown away when memory became scarce, and data segments moved in memory when a given application had relinquished processor control, typically waiting for user input.@@@@1@54@@danf@17-8-2009 10550150@unknown@formal@none@1@S@16-bit Windows versions include [[Windows 1.0]] (1985), [[Windows 2.0]] (1987) and its close relatives, ''[[Windows 2.1x|Windows/286-Windows/386]]''.@@@@1@16@@danf@17-8-2009 10550160@unknown@formal@none@1@S@===Hybrid 16/32-bit operating environments===@@@@1@4@@danf@17-8-2009 10550170@unknown@formal@none@1@S@[[Windows 2.1x|Windows/386]] introduced a 32-bit [[protected mode]] [[kernel (computer science)|kernel]] and [[virtual machine]] monitor.@@@@1@14@@danf@17-8-2009 10550180@unknown@formal@none@1@S@For the duration of a Windows session, it created one or more [[virtual 8086 mode|virtual 8086 environments]] and provided device virtualization for the video card, keyboard, mouse, timer and [[interrupt]] controller inside each of them.@@@@1@35@@danf@17-8-2009 10550190@unknown@formal@none@1@S@The user-visible consequence was that it became possible to preemptively multitask multiple MS-DOS environments in separate windows, although graphical MS-DOS applications required full screen mode.@@@@1@25@@danf@17-8-2009 10550200@unknown@formal@none@1@S@Also, Windows applications were multi-tasked cooperatively inside one such virtual 8086 environment.@@@@1@12@@danf@17-8-2009 10550210@unknown@formal@none@1@S@[[Windows 3.0]] (1990) and [[Windows 3.1x|Windows 3.1]] (1992) improved the design, mostly because of [[virtual memory]] and loadable virtual device drivers ([[VxD]]s) which allowed them to share arbitrary devices between multitasked DOS windows.@@@@1@33@@danf@17-8-2009 10550220@unknown@formal@none@1@S@Also, Windows applications could now run in protected mode (when Windows was running in Standard or 386 Enhanced Mode), which gave them access to several megabytes of memory and removed the obligation to participate in the software virtual memory scheme.@@@@1@40@@danf@17-8-2009 10550230@unknown@formal@none@1@S@They still ran inside the same address space, where the segmented memory provided a degree of protection, and multi-tasked cooperatively.@@@@1@20@@danf@17-8-2009 10550240@unknown@formal@none@1@S@For Windows 3.0, Microsoft also rewrote critical operations from [[C (programming language)|C]] into [[Assembly language|assembly]], making this release faster and less memory-hungry than its predecessors.@@@@1@25@@danf@17-8-2009 10550250@unknown@formal@none@1@S@===Hybrid 16/32-bit operating systems===@@@@1@4@@danf@17-8-2009 10550260@unknown@formal@none@1@S@With the introduction of the [[32-bit]] [[Windows 3.1x|Windows for Workgroups 3.11]], Windows was able to stop relying on DOS for file management.@@@@1@22@@danf@17-8-2009 10550270@unknown@formal@none@1@S@Leveraging this, [[Windows 95]] introduced [[Long filename|Long File Names]], reducing the [[8.3 filename]] DOS environment to the role of a [[boot loader]].@@@@1@22@@danf@17-8-2009 10550280@unknown@formal@none@1@S@MS-DOS was now bundled with Windows; this notably made it (partially) aware of long file names when its utilities were run from within Windows.@@@@1@24@@danf@17-8-2009 10550290@unknown@formal@none@1@S@The most important novelty was the possibility of running 32-bit multi-threaded preemptively multitasked graphical programs.@@@@1@15@@danf@17-8-2009 10550300@unknown@formal@none@1@S@However, the necessity of keeping compatibility with 16-bit programs meant the GUI components were still 16-bit only and not fully reentrant, which resulted in reduced performance and stability.@@@@1@28@@danf@17-8-2009 10550310@unknown@formal@none@1@S@There were three releases of Windows 95 (the first in 1995, then subsequent bug-fix versions in 1996 and 1997, only released to OEMs, which added extra features such as [[File Allocation Table|FAT32]] and primitive USB support).@@@@1@36@@danf@17-8-2009 10550320@unknown@formal@none@1@S@Microsoft's next OS was [[Windows 98]]; there were two versions of this (the first in 1998 and the second, named "Windows 98 Second Edition", in 1999).@@@@1@26@@danf@17-8-2009 10550330@unknown@formal@none@1@S@In 2000, Microsoft released [[Windows Me]] (''Me'' standing for ''Millennium Edition''), which used the same core as Windows 98 but adopted some aspects of Windows 2000 and removed the option boot into DOS mode.@@@@1@34@@danf@17-8-2009 10550340@unknown@formal@none@1@S@It also added a new feature called System Restore, allowing the user to set the computer's settings back to an earlier date.@@@@1@22@@danf@17-8-2009 10550350@unknown@formal@none@1@S@===32-bit operating systems===@@@@1@3@@danf@17-8-2009 10550360@unknown@formal@none@1@S@The NT family of Windows systems was fashioned and marketed for higher reliability business use, and was unencumbered by any Microsoft DOS patrimony.@@@@1@23@@danf@17-8-2009 10550370@unknown@formal@none@1@S@The first release was [[Windows NT 3.1]] (1993, numbered "3.1" to match the Windows version and to one-up [[OS/2]] 2.1, IBM's flagship OS co-developed by Microsoft and was Windows NT's main competitor at the time), which was followed by [[Windows NT 3.5|NT 3.5]] (1994), [[Windows NT 3.51|NT 3.51]] (1995), [[Windows NT 4.0|NT 4.0]] (1996), and [[Windows 2000]] (essentially NT 5.0).@@@@1@60@@danf@17-8-2009 10550380@unknown@formal@none@1@S@NT 4.0 was the first in this line to implement the "Windows 95" user interface (and the first to include Windows 95's built-in 32-bit runtimes).@@@@1@25@@danf@17-8-2009 10550390@unknown@formal@none@1@S@Microsoft then moved to combine their consumer and business operating systems.@@@@1@11@@danf@17-8-2009 10550400@unknown@formal@none@1@S@[[Windows XP]], coming in both home and professional versions (and later niche market versions for [[tablet PC]]s and [[media center]]s) improved stability, user experience and backwards compatibility.@@@@1@27@@danf@17-8-2009 10550410@unknown@formal@none@1@S@Then, [[Windows Server 2003]] brought [[Windows Server]] up to date with Windows XP.@@@@1@13@@danf@17-8-2009 10550420@unknown@formal@none@1@S@Since then, a new version, [[Windows Vista]] was released and [[Windows Server 2008]], released on [[February 27]], [[2008]], brings [[Windows Server]] up to date with [[Windows Vista]].@@@@1@27@@danf@17-8-2009 10550430@unknown@formal@none@1@S@[[Windows CE]], Microsoft's offering in the mobile and embedded markets, is also a true 32-bit operating system that offers various services for all sub-operating workstations.@@@@1@25@@danf@17-8-2009 10550440@unknown@formal@none@1@S@===64-bit operating systems===@@@@1@3@@danf@17-8-2009 10550450@unknown@formal@none@1@S@[[Windows NT]] included support for several different platforms before the [[X86 architecture|x86]]-based [[personal computer]] became dominant in the professional world.@@@@1@20@@danf@17-8-2009 10550460@unknown@formal@none@1@S@Versions of NT from 3.1 to 4.0 variously supported [[PowerPC]], [[DEC Alpha]] and [[MIPS Technologies|MIPS]] R4000, some of which were 64-bit processors, although the operating system treated them as 32-bit processors.@@@@1@31@@danf@17-8-2009 10550470@unknown@formal@none@1@S@With the introduction of the [[Intel]] [[Itanium]] architecture, which is referred to as [[IA-64]], Microsoft released new versions of Windows to support it.@@@@1@23@@danf@17-8-2009 10550480@unknown@formal@none@1@S@Itanium versions of [[Windows XP]] and [[Windows Server 2003]] were released at the same time as their mainstream x86 (32-bit) counterparts.@@@@1@21@@danf@17-8-2009 10550490@unknown@formal@none@1@S@On [[April 25]] [[2005]], Microsoft released [[Windows XP Professional x64 Edition]] and x64 versions of Windows Server 2003 to support the [[x86-64|AMD64/Intel64]] (or ''x64'' in Microsoft terminology) architecture.@@@@1@28@@danf@17-8-2009 10550500@unknown@formal@none@1@S@Microsoft dropped support for the Itanium version of Windows XP in 2005.@@@@1@12@@danf@17-8-2009 10550510@unknown@formal@none@1@S@[[Windows Vista]] is the first end-user version of Windows that Microsoft has released simultaneously in 32-bit and x64 editions.@@@@1@19@@danf@17-8-2009 10550520@unknown@formal@none@1@S@Windows Vista does not support the Itanium architecture.@@@@1@8@@danf@17-8-2009 10550530@unknown@formal@none@1@S@The modern 64-bit Windows family comprises AMD64/Intel64 versions of [[Windows Vista]], and [[Windows Server 2003]] and [[Windows Server 2008]], in both Itanium and x64 editions.@@@@1@25@@danf@17-8-2009 10550540@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10550550@unknown@formal@none@1@S@Microsoft has taken two parallel routes in its operating systems.@@@@1@10@@danf@17-8-2009 10550560@unknown@formal@none@1@S@One route has been for the home user and the other has been for the professional IT user.@@@@1@18@@danf@17-8-2009 10550570@unknown@formal@none@1@S@The dual routes have generally led to home versions having greater [[multimedia]] support and less functionality in networking and security, and professional versions having inferior multimedia support and better networking and security.@@@@1@32@@danf@17-8-2009 10550580@unknown@formal@none@1@S@The first version of Microsoft Windows, [[Windows 1.0|version 1.0]], released in November 1985, lacked a degree of functionality and achieved little popularity, and was to compete with Apple's own operating system.@@@@1@31@@danf@17-8-2009 10550590@unknown@formal@none@1@S@Windows 1.0 is not a complete operating system; rather, it extends MS-DOS.@@@@1@12@@danf@17-8-2009 10550600@unknown@formal@none@1@S@Microsoft Windows version 2.0 was released in November, 1987 and was slightly more popular than its predecessor.@@@@1@17@@danf@17-8-2009 10550610@unknown@formal@none@1@S@Windows 2.03 (release date January 1988) had changed the OS from tiled windows to overlapping windows.@@@@1@16@@danf@17-8-2009 10550620@unknown@formal@none@1@S@The result of this change led to Apple Computer filing a suit against Microsoft alleging infringement on Apple's copyrights.@@@@1@19@@danf@17-8-2009 10550630@unknown@formal@none@1@S@Microsoft Windows version 3.0, released in 1990, was the first Microsoft Windows version to achieve broad commercial success, selling 2 million copies in the first six months.[http://www.islandnet.com/~kpolsson/compsoft/soft1991.htm][http://www.thocp.net/companies/microsoft/microsoft_company.htm]@@@@1@27@@danf@17-8-2009 10550635@unknown@formal@none@1@S@It featured improvements to the user interface and to multitasking capabilities.@@@@1@11@@danf@17-8-2009 10550640@unknown@formal@none@1@S@It received a facelift in Windows 3.1, made generally available on [[March 1]], [[1992]].@@@@1@14@@danf@17-8-2009 10550650@unknown@formal@none@1@S@Windows 3.1 support ended on [[December 31]], [[2001]].@@@@1@8@@danf@17-8-2009 10550660@unknown@formal@none@1@S@In July 1993, Microsoft released [[Windows NT]] based on a new kernel.@@@@1@12@@danf@17-8-2009 10550670@unknown@formal@none@1@S@NT was considered to be the professional OS and was the first Windows version to utilize [[preemptive multitasking]]..@@@@1@18@@danf@17-8-2009 10550680@unknown@formal@none@1@S@Windows NT would later be retooled to also function as a home operating system, with Windows XP.@@@@1@17@@danf@17-8-2009 10550690@unknown@formal@none@1@S@On August 24th 1995, Microsoft released [[Windows 95]], a new, and major, consumer version that made further changes to the user interface, and also used [[preemptive multitasking]].@@@@1@27@@danf@17-8-2009 10550700@unknown@formal@none@1@S@Windows 95 was designed to replace not only Windows 3.1, but also Windows for Workgroups, and MS-DOS.@@@@1@17@@danf@17-8-2009 10550710@unknown@formal@none@1@S@It was also the first Windows operating system to use Plug and Play capabilities.@@@@1@14@@danf@17-8-2009 10550720@unknown@formal@none@1@S@The changes Windows 95 brought to the desktop were revolutionary, as opposed to evolutionary, such as those in Windows 98 and Windows Me.@@@@1@23@@danf@17-8-2009 10550730@unknown@formal@none@1@S@Mainstream support for [[Windows 95]] ended on [[December 31]], [[2000]] and extended support for [[Windows 95]] ended on [[December 31]], [[2001]].@@@@1@21@@danf@17-8-2009 10550740@unknown@formal@none@1@S@The next in the consumer line was Microsoft [[Windows 98]] released on June 25th, 1998.@@@@1@15@@danf@17-8-2009 10550750@unknown@formal@none@1@S@It was substantially criticized for its slowness and for its unreliability compared with [[Windows 95]], but many of its basic problems were later rectified with the release of [[Windows 98]] Second Edition in 1999.@@@@1@34@@danf@17-8-2009 10550760@unknown@formal@none@1@S@Mainstream support for [[Windows 98]] ended on [[June 30]], [[2002]] and extended support for [[Windows 98]] ended on [[July 11]], [[2006]].@@@@1@21@@danf@17-8-2009 10550770@unknown@formal@none@1@S@As part of its "professional" line, Microsoft released [[Windows 2000]] in February 2000.@@@@1@13@@danf@17-8-2009 10550780@unknown@formal@none@1@S@The consumer version following Windows 98 was [[Windows Me]] (Windows Millennium Edition).@@@@1@12@@danf@17-8-2009 10550790@unknown@formal@none@1@S@Released in September 2000, [[Windows Me]] implemented a number of new technologies for Microsoft: most notably publicized was "[[Universal Plug and Play]]."@@@@1@22@@danf@17-8-2009 10550800@unknown@formal@none@1@S@In October 2001, Microsoft released [[Windows XP]], a version built on the Windows NT [[Kernel (computer science)|kernel]] that also retained the consumer-oriented usability of Windows 95 and its successors.@@@@1@29@@danf@17-8-2009 10550810@unknown@formal@none@1@S@This new version was widely praised in computer magazines.@@@@1@9@@danf@17-8-2009 10550820@unknown@formal@none@1@S@It shipped in two distinct editions, "Home" and "Professional", the former lacking many of the superior security and networking features of the Professional edition.@@@@1@24@@danf@17-8-2009 10550830@unknown@formal@none@1@S@Additionally, the first "Media Center" edition was released in 2002, with an emphasis on support for DVD and TV functionality including program recording and a remote control.@@@@1@27@@danf@17-8-2009 10550840@unknown@formal@none@1@S@Mainstream support for [[Windows XP]] will continue until [[April 14]], [[2009]] and extended support will continue until [[April 8]], [[2014]].@@@@1@20@@danf@17-8-2009 10550850@unknown@formal@none@1@S@In April 2003, [[Windows Server 2003]] was introduced, replacing the [[Windows 2000]] line of server products with a number of new features and a strong focus on security; this was followed in December 2005 by Windows Server 2003 R2.@@@@1@39@@danf@17-8-2009 10550860@unknown@formal@none@1@S@On [[January 30]], [[2007]] Microsoft released [[Windows Vista]].@@@@1@8@@danf@17-8-2009 10550870@unknown@formal@none@1@S@It contains a number of [[Features new to Windows Vista|new features]], from a redesigned shell and user interface to significant [[Technical features new to Windows Vista|technical changes]], with a particular focus on [[Security and safety features new to Windows Vista|security features]].@@@@1@41@@danf@17-8-2009 10550880@unknown@formal@none@1@S@It is available in a number of [[Windows Vista editions and pricing|different editions]], and has been subject to [[Criticism of Windows Vista|some criticism]].@@@@1@23@@danf@17-8-2009 10550890@unknown@formal@none@1@S@==Timeline of releases==@@@@1@3@@danf@17-8-2009 10550900@unknown@formal@none@1@S@==Security==@@@@1@1@@danf@17-8-2009 10550910@unknown@formal@none@1@S@[[Computer security|Security]] has been a hot topic with Windows for many years, and even Microsoft itself has been the victim of security breaches.@@@@1@23@@danf@17-8-2009 10550920@unknown@formal@none@1@S@Consumer versions of Windows were originally designed for ease-of-use on a single-user PC without a network connection, and did not have security features built in from the outset.@@@@1@28@@danf@17-8-2009 10550930@unknown@formal@none@1@S@[[Windows NT]] and its successors are designed for security (including on a network) and multi-user PCs, but are not designed with Internet security in mind as much since, when it was first developed in the early 1990s, Internet use was less prevalent.@@@@1@42@@danf@17-8-2009 10550940@unknown@formal@none@1@S@These design issues combined with flawed code (such as [[buffer overflow]]s) and the popularity of Windows means that it is a frequent target of [[computer worm|worm]] and [[computer virus|virus]] writers.@@@@1@30@@danf@17-8-2009 10550950@unknown@formal@none@1@S@In June 2005, [[Bruce Schneier]]'s ''Counterpane Internet Security'' reported that it had seen over 1,000 new viruses and worms in the previous six months.@@@@1@24@@danf@17-8-2009 10550960@unknown@formal@none@1@S@Microsoft releases security patches through its [[Windows Update]] service approximately once a month (usually the second Tuesday of the month), although critical updates are made available at shorter intervals when necessary.@@@@1@31@@danf@17-8-2009 10550970@unknown@formal@none@1@S@In Windows 2000 (SP3 and later), Windows XP and Windows Server 2003, updates can be automatically downloaded and installed if the user selects to do so.@@@@1@26@@danf@17-8-2009 10550980@unknown@formal@none@1@S@As a result, Service Pack 2 for Windows XP, as well as Service Pack 1 for Windows Server 2003, were installed by users more quickly than it otherwise might have been.@@@@1@31@@danf@17-8-2009 10550990@unknown@formal@none@1@S@===Windows Defender===@@@@1@2@@danf@17-8-2009 10551000@unknown@formal@none@1@S@On [[6 January]] [[2005]], Microsoft released a beta version of Microsoft AntiSpyware, based upon the previously released [[GIANT Company Software|Giant]] AntiSpyware.@@@@1@21@@danf@17-8-2009 10551010@unknown@formal@none@1@S@On [[14 February]], [[2006]], Microsoft AntiSpyware became [[Windows Defender]] with the release of beta 2.@@@@1@15@@danf@17-8-2009 10551020@unknown@formal@none@1@S@Windows Defender is a freeware program designed to protect against spyware and other unwanted software.@@@@1@15@@danf@17-8-2009 10551030@unknown@formal@none@1@S@[[Windows XP]] and [[Windows Server 2003]] users who have [[Windows Genuine Advantage|genuine]] copies of Microsoft Windows can freely download the program from Microsoft's web site, and Windows Defender ships as part of [[Windows Vista]].@@@@1@34@@danf@17-8-2009 10551040@unknown@formal@none@1@S@===Third-party analysis===@@@@1@2@@danf@17-8-2009 10551050@unknown@formal@none@1@S@In an article based on a report by Symantec, internetnews.com has described Microsoft Windows as having the "fewest number of patches and the shortest average patch development time of the five operating systems it monitored in the last six months of 2006."@@@@1@42@@danf@17-8-2009 10551060@unknown@formal@none@1@S@And the number of vulnerabilities found in Windows has significantly increased— Windows: 12+, Red Hat + Fedora: 2, Mac OS X: 1, HP-UX: 2, Solaris: 1.@@@@1@26@@danf@17-8-2009 10551070@unknown@formal@none@1@S@A study conducted by [[Kevin Mitnick]] and marketing communications firm Avantgarde in 2004 found that an unprotected and unpatched Windows XP system with Service Pack 1 lasted only 4 minutes on the Internet before it was compromised, and an unprotected and also unpatched [[Windows Server 2003]] system was compromised after being connected to the internet for 8 hours.@@@@1@58@@danf@17-8-2009 10551080@unknown@formal@none@1@S@However, it is important to note that this study does not apply to Windows XP systems running the Service Pack 2 update (released in late 2004), which vastly improved the security of Windows XP.@@@@1@34@@danf@17-8-2009 10551090@unknown@formal@none@1@S@The computer that was running Windows XP Service Pack 2 was not compromised.@@@@1@13@@danf@17-8-2009 10551100@unknown@formal@none@1@S@The [[AOL]] National Cyber Security Alliance Online Safety Study of October 2004 determined that 80% of Windows users were infected by at least one [[spyware]]/[[adware]] product.@@@@1@26@@danf@17-8-2009 10551110@unknown@formal@none@1@S@Much documentation is available describing how to increase the security of Microsoft Windows products.@@@@1@14@@danf@17-8-2009 10551120@unknown@formal@none@1@S@Typical suggestions include deploying Microsoft Windows behind a hardware or software [[firewall]], running [[anti-virus]] and [[anti-spyware]] software, and installing patches as they become available through [[Windows Update]].@@@@1@27@@danf@17-8-2009 10551130@unknown@formal@none@1@S@==Windows Lifecycle Policy==@@@@1@3@@danf@17-8-2009 10551140@unknown@formal@none@1@S@Microsoft has stopped releasing updates and hotfixes for many old Windows operating systems, including all versions of Windows 9x and earlier versions of Windows NT.@@@@1@25@@danf@17-8-2009 10551150@unknown@formal@none@1@S@Windows versions prior to [[Windows XP|XP]] are no longer supported, with the exception of [[Windows 2000]], which is currently in the Extended Support Period, that will end on [[July 13]], [[2010]].@@@@1@31@@danf@17-8-2009 10551160@unknown@formal@none@1@S@Windows XP versions prior to SP2 are no longer supported either.@@@@1@11@@danf@17-8-2009 10551170@unknown@formal@none@1@S@Also, support for [[Windows XP 64-bit Edition]] ended after the release of the more recent [[Windows XP Professional x64 Edition]].@@@@1@20@@danf@17-8-2009 10551180@unknown@formal@none@1@S@No new updates are created for unsupported versions of Windows.@@@@1@10@@danf@17-8-2009 10551190@unknown@formal@none@1@S@==Emulation software==@@@@1@2@@danf@17-8-2009 10551200@unknown@formal@none@1@S@Emulation allows the use of some Windows applications without using Microsoft Windows.@@@@1@12@@danf@17-8-2009 10551210@unknown@formal@none@1@S@These include:@@@@1@2@@danf@17-8-2009 10551220@unknown@formal@none@1@S@* [[Wine (software)|Wine]] - a [[free and open source software]] implementation of the [[Windows API]], allowing one to run many Windows applications on x86-based platforms, including [[Linux]].@@@@1@27@@danf@17-8-2009 10551230@unknown@formal@none@1@S@Wine is technically not an emulator but a "compatibility layer"; while an emulator effectively 'pretends' to be a different CPU, Wine instead makes use of Windows-style APIs to 'simulate' the Windows environment directly.@@@@1@33@@danf@17-8-2009 10551240@unknown@formal@none@1@S@** [[CrossOver]] - A Wine package with licensed fonts.@@@@1@9@@danf@17-8-2009 10551250@unknown@formal@none@1@S@Its developers are regular contributors to Wine, and focus on Wine running officially supported applications.@@@@1@15@@danf@17-8-2009 10551260@unknown@formal@none@1@S@** [[Cedega]] - [[TransGaming Technologies]]' proprietary [[Fork (software development)|fork]] of Wine, designed specifically for running games written for Microsoft Windows under Linux.@@@@1@22@@danf@17-8-2009 10551270@unknown@formal@none@1@S@** [[Darwine]] - This project intends to port and develop Wine as well as other supporting tools that will allow [[Darwin (operating system)|Darwin]] and [[Mac OS X]] users to run Microsoft Windows applications, and to provide [[Win32]] [[Application Programming Interface|API]] compatibility at application source code level.@@@@1@46@@danf@17-8-2009 10551280@unknown@formal@none@1@S@* [[ReactOS]] - An open-source OS that is intended to run the same software as Windows, originally designed to imitate Windows NT 4.0, now aiming at Windows XP compatibility.@@@@1@29@@danf@17-8-2009 10551290@unknown@formal@none@1@S@It has been in the [[development stage]] since 1996.@@@@1@9@@danf@17-8-2009 10560010@unknown@formal@none@1@S@
Morphology (linguistics)
@@@@1@2@@danf@17-8-2009 10560020@unknown@formal@none@1@S@'''Morphology''' is the field of [[linguistics]] that studies the internal structure of words.@@@@1@13@@danf@17-8-2009 10560030@unknown@formal@none@1@S@(Words as units in the lexicon are the subject matter of [[lexicology]].)@@@@1@12@@danf@17-8-2009 10560040@unknown@formal@none@1@S@While words are generally accepted as being (with [[clitic]]s) the smallest units of [[syntax]], it is clear that in most (if not all) languages, words can be related to other words by rules.@@@@1@33@@danf@17-8-2009 10560050@unknown@formal@none@1@S@For example, [[English language|English]] speakers recognize that the words ''dog'', ''dogs'', and ''dog-catcher'' are closely related.@@@@1@16@@danf@17-8-2009 10560060@unknown@formal@none@1@S@English speakers recognize these relations from their tacit knowledge of the rules of word-formation in English.@@@@1@16@@danf@17-8-2009 10560070@unknown@formal@none@1@S@They intuit that ''dog'' is to ''dogs'' as ''cat'' is to ''cats''; similarly, ''dog'' is to ''dog-catcher'' as ''dish'' is to ''dishwasher''.@@@@1@22@@danf@17-8-2009 10560080@unknown@formal@none@1@S@The rules understood by the speaker reflect specific patterns (or regularities) in the way words are formed from smaller units and how those smaller units interact in speech.@@@@1@28@@danf@17-8-2009 10560090@unknown@formal@none@1@S@In this way, morphology is the branch of linguistics that studies patterns of word-formation within and across languages, and attempts to formulate rules that model the knowledge of the speakers of those languages.@@@@1@33@@danf@17-8-2009 10560100@unknown@formal@none@1@S@==History ==@@@@1@2@@danf@17-8-2009 10560110@unknown@formal@none@1@S@The history of morphological analysis dates back to the [[ancient India]]n linguist , who formulated the 3,959 rules of [[Sanskrit]] morphology in the text by using a Constituency Grammar.@@@@1@29@@danf@17-8-2009 10560120@unknown@formal@none@1@S@The Graeco-Roman grammatical tradition also engaged in morphological analysis.@@@@1@9@@danf@17-8-2009 10560130@unknown@formal@none@1@S@The term ''morphology'' was coined by [[August Schleicher]] in [[1859]]@@@@1@10@@danf@17-8-2009 10560140@unknown@formal@none@1@S@== Fundamental concepts ==@@@@1@4@@danf@17-8-2009 10560150@unknown@formal@none@1@S@=== Lexemes and word forms ===@@@@1@6@@danf@17-8-2009 10560160@unknown@formal@none@1@S@The distinction between these two senses of "word" is arguably the most important one in morphology.@@@@1@16@@danf@17-8-2009 10560170@unknown@formal@none@1@S@The first sense of "word," the one in which ''dog'' and ''dogs'' are "the same word," is called '''[[lexeme]]'''.@@@@1@19@@danf@17-8-2009 10560180@unknown@formal@none@1@S@The second sense is called '''word-form'''.@@@@1@6@@danf@17-8-2009 10560190@unknown@formal@none@1@S@We thus say that ''dog'' and ''dogs'' are different forms of the same lexeme.@@@@1@14@@danf@17-8-2009 10560200@unknown@formal@none@1@S@''Dog'' and ''dog-catcher'', on the other hand, are different lexemes; for example, they refer to two different kinds of entities.@@@@1@20@@danf@17-8-2009 10560210@unknown@formal@none@1@S@The form of a word that is chosen conventionally to represent the canonical form of a word is called a [[lemma (linguistics)|lemma]], or '''citation form'''.@@@@1@25@@danf@17-8-2009 10560220@unknown@formal@none@1@S@==== Prosodic word vs. morphological word ====@@@@1@7@@danf@17-8-2009 10560230@unknown@formal@none@1@S@Here are examples from other languages of the failure of a single phonological word to coincide with a single morphological word-form.@@@@1@21@@danf@17-8-2009 10560240@unknown@formal@none@1@S@In Latin, one way to express the concept of 'NOUN-PHRASE1 and NOUN-PHRASE2' (as in "apples and oranges") is to suffix '-que' to the second noun phrase: "apples oranges-and", as it were.@@@@1@33@@danf@17-8-2009 10560250@unknown@formal@none@1@S@An extreme level of this theoretical quandary posed by some phonological words is provided by the Kwak'wala language.@@@@1@18@@danf@17-8-2009 10560260@unknown@formal@none@1@S@In Kwak'wala, as in a great many other languages, meaning relations between nouns, including possession and "semantic case", are formulated by affixes instead of by independent "words".@@@@1@27@@danf@17-8-2009 10560270@unknown@formal@none@1@S@The three word English phrase, "with his club", where 'with' identifies its dependent noun phrase as an instrument and 'his' denotes a possession relation, would consist of two words or even just one word in many languages.@@@@1@37@@danf@17-8-2009 10560280@unknown@formal@none@1@S@But affixation for semantic relations in Kwak'wala differs dramatically (from the viewpoint of those whose language is not Kwak'wala) from such affixation in other languages for this reason: the affixes phonologically attach not to the lexeme they pertain to semantically, but to the ''preceding'' lexeme.@@@@1@45@@danf@17-8-2009 10560290@unknown@formal@none@1@S@Consider the following example (in Kwakw'ala, sentences begin with what corresponds to an English verb):@@@@1@15@@danf@17-8-2009 10560300@unknown@formal@none@1@S@kwixʔid-i-da bəgwanəmai-χ-a q'asa-s-isi t'alwagwayu@@@@1@4@@danf@17-8-2009 10560310@unknown@formal@none@1@S@Morpheme by morpheme translation:@@@@1@4@@danf@17-8-2009 10560320@unknown@formal@none@1@S@kwixʔid-i-da = clubbed-PIVOT-DETERMINER@@@@1@4@@danf@17-8-2009 10560330@unknown@formal@none@1@S@bəgwanəma-χ-a = man-ACCUSATIVE-DETERMINER@@@@1@4@@danf@17-8-2009 10560340@unknown@formal@none@1@S@q'asa-s-is = otter-INSTRUMENTAL-3.PERSON.SINGULAR-POSSESSIVE@@@@1@4@@danf@17-8-2009 10560350@unknown@formal@none@1@S@t'alwagwayu = club.@@@@1@3@@danf@17-8-2009 10560360@unknown@formal@none@1@S@"the man clubbed the otter with his club"@@@@1@8@@danf@17-8-2009 10560370@unknown@formal@none@1@S@(Notation notes:@@@@1@2@@danf@17-8-2009 10560380@unknown@formal@none@1@S@1. accusative case marks an entity that something is done to.@@@@1@11@@danf@17-8-2009 10560390@unknown@formal@none@1@S@2. determiners are words such as "the", "this", "that".@@@@1@9@@danf@17-8-2009 10560400@unknown@formal@none@1@S@3. the concept of "pivot" is a theoretical construct that is not relevant to this discussion.)@@@@1@16@@danf@17-8-2009 10560410@unknown@formal@none@1@S@That is, to the speaker of Kwak'wala, the sentence does not contain the "words" 'him-the-otter' or 'with-his-club' Instead, the markers -''i-da'' (PIVOT-'the'), referring to ''man'', attaches not to ''bəgwanəma'' ('man'), but instead to the "verb"; the markers -''χ-a'' (ACCUSATIVE-'the'), referring to ''otter'', attach to ''bəgwanəma'' instead of to ''q'asa'' ('otter'), etc.@@@@1@53@@danf@17-8-2009 10560420@unknown@formal@none@1@S@To summarize differently: a speaker of Kwak'wala does ''not'' perceive the sentence to consist of these phonological words:@@@@1@18@@danf@17-8-2009 10560430@unknown@formal@none@1@S@kwixʔid i-da-bəgwanəma χ-a-q'asa s-isi-t'alwagwayu@@@@1@4@@danf@17-8-2009 10560440@unknown@formal@none@1@S@"clubbed PIVOT-the-mani hit-the-otter with-hisi-club@@@@1@4@@danf@17-8-2009 10560450@unknown@formal@none@1@S@A central publication on this topic is the recent volume edited by Dixon and Aikhenvald (2007), examining the mismatch between prosodic-phonological and grammatical definitions of "word" in various Amazonian, Australian Aboriginal, Caucasian, Eskimo, Indo-European, Native North American, and West African languages, and in sign languages.@@@@1@45@@danf@17-8-2009 10560460@unknown@formal@none@1@S@Apparently, a wide variety of languages make use of the hybrid linguistic unit clitic, possessing the grammatical features of independent words but the prosodic-phonological lack of freedom of bound morphemes.@@@@1@30@@danf@17-8-2009 10560470@unknown@formal@none@1@S@The intermediate status of clitics poses a considerable challenge to linguistic theory.@@@@1@12@@danf@17-8-2009 10560480@unknown@formal@none@1@S@=== Inflection vs. word-formation ===@@@@1@5@@danf@17-8-2009 10560490@unknown@formal@none@1@S@Given the notion of a lexeme, it is possible to distinguish two kinds of morphological rules.@@@@1@16@@danf@17-8-2009 10560500@unknown@formal@none@1@S@Some morphological rules relate to different forms of the same lexeme; while other rules relate to different lexemes.@@@@1@18@@danf@17-8-2009 10560510@unknown@formal@none@1@S@Rules of the first kind are called '''[[Inflection|inflectional rules]]''', while those of the second kind are called '''[[word formation|word-formation]]'''.@@@@1@19@@danf@17-8-2009 10560520@unknown@formal@none@1@S@The English plural, as illustrated by ''dog'' and ''dogs'', is an inflectional rule; compounds like ''dog-catcher'' or ''dishwasher'' provide an example of a word-formation rule.@@@@1@25@@danf@17-8-2009 10560530@unknown@formal@none@1@S@Informally, word-formation rules form "new words" (that is, new lexemes), while inflection rules yield variant forms of the "same" word (lexeme).@@@@1@21@@danf@17-8-2009 10560540@unknown@formal@none@1@S@There is a further distinction between two kinds of word-formation: [[Derivation (linguistics)|derivation]] and [[Compound (linguistics)|compounding]].@@@@1@15@@danf@17-8-2009 10560550@unknown@formal@none@1@S@Compounding is a process of word-formation that involves combining complete word-forms into a single '''compound''' form; ''dog-catcher'' is therefore a compound, because both ''dog'' and ''catcher'' are complete word-forms in their own right before the compounding process has been applied, and are subsequently treated as one form.@@@@1@47@@danf@17-8-2009 10560560@unknown@formal@none@1@S@Derivation involves [[affix]]ing [[bound morpheme|bound]] (non-independent) forms to existing lexemes, whereby the addition of the affix '''derives''' a new lexeme.@@@@1@20@@danf@17-8-2009 10560570@unknown@formal@none@1@S@One example of derivation is clear in this case: the word ''independent'' is derived from the word ''dependent'' by prefixing it with the derivational prefix ''in-'', while ''dependent'' itself is derived from the verb ''depend''.@@@@1@35@@danf@17-8-2009 10560580@unknown@formal@none@1@S@The distinction between inflection and word-formation is not at all clear-cut.@@@@1@11@@danf@17-8-2009 10560590@unknown@formal@none@1@S@There are many examples where linguists fail to agree whether a given rule is inflection or word-formation.@@@@1@17@@danf@17-8-2009 10560600@unknown@formal@none@1@S@The next section will attempt to clarify this distinction.@@@@1@9@@danf@17-8-2009 10560610@unknown@formal@none@1@S@=== Paradigms and morphosyntax ===@@@@1@5@@danf@17-8-2009 10560620@unknown@formal@none@1@S@A '''paradigm''' is the complete set of related word-forms associated with a given lexeme.@@@@1@14@@danf@17-8-2009 10560630@unknown@formal@none@1@S@The familiar examples of paradigms are the [[Grammatical conjugation|conjugations]] of verbs, and the [[declension]]s of nouns.@@@@1@16@@danf@17-8-2009 10560640@unknown@formal@none@1@S@Accordingly, the word-forms of a lexeme may be arranged conveniently into tables, by classifying them according to shared inflectional categories such as [[grammatical tense|tense]], [[grammatical aspect|aspect]], [[grammatical mood|mood]], [[grammatical number|number]], [[grammatical gender|gender]] or [[grammatical case|case]].@@@@1@35@@danf@17-8-2009 10560650@unknown@formal@none@1@S@For example, the personal pronouns in English can be organized into tables, using the categories of person (1st., 2nd., 3rd.), number (singular vs. plural), gender (masculine, feminine, neuter), and [[grammatical case|case]] (subjective, objective, and possessive).@@@@1@35@@danf@17-8-2009 10560660@unknown@formal@none@1@S@See [[English personal pronouns]] for the details.@@@@1@7@@danf@17-8-2009 10560670@unknown@formal@none@1@S@The inflectional categories used to group word-forms into paradigms cannot be chosen arbitrarily; they must be categories that are relevant to stating the [[syntax|syntactic rules]] of the language.@@@@1@28@@danf@17-8-2009 10560680@unknown@formal@none@1@S@For example, person and number are categories that can be used to define paradigms in English, because English has [[Agreement (linguistics)|grammatical agreement]] rules that require the verb in a sentence to appear in an inflectional form that matches the person and number of the subject.@@@@1@45@@danf@17-8-2009 10560690@unknown@formal@none@1@S@In other words, the syntactic rules of English care about the difference between ''dog'' and ''dogs'', because the choice between these two forms determines which form of the verb is to be used.@@@@1@33@@danf@17-8-2009 10560700@unknown@formal@none@1@S@In contrast, however, no syntactic rule of English cares about the difference between ''dog'' and ''dog-catcher'', or ''dependent'' and ''independent''.@@@@1@20@@danf@17-8-2009 10560710@unknown@formal@none@1@S@The first two are just nouns, and the second two just adjectives, and they generally behave like any other noun or adjective behaves.@@@@1@23@@danf@17-8-2009 10560720@unknown@formal@none@1@S@An important difference between inflection and word-formation is that inflected word-forms of lexemes are organized into paradigms, which are defined by the requirements of syntactic rules, whereas the rules of word-formation are not restricted by any corresponding requirements of syntax.@@@@1@40@@danf@17-8-2009 10560730@unknown@formal@none@1@S@Inflection is therefore said to be relevant to syntax, and word-formation is not.@@@@1@13@@danf@17-8-2009 10560740@unknown@formal@none@1@S@The part of morphology that covers the relationship between [[syntax]] and morphology is called morphosyntax, and it concerns itself with inflection and paradigms, but not with word-formation or compounding.@@@@1@29@@danf@17-8-2009 10560750@unknown@formal@none@1@S@=== Allomorphy ===@@@@1@3@@danf@17-8-2009 10560760@unknown@formal@none@1@S@In the exposition above, morphological rules are described as analogies between word-forms: ''dog'' is to ''dogs'' as ''cat'' is to ''cats'', and as ''dish'' is to ''dishes''.@@@@1@27@@danf@17-8-2009 10560770@unknown@formal@none@1@S@In this case, the analogy applies both to the form of the words and to their meaning: in each pair, the first word means "one of X", while the second "two or more of X", and the difference is always the plural form ''-s'' affixed to the second word, signaling the key distinction between singular and plural entities.@@@@1@58@@danf@17-8-2009 10560780@unknown@formal@none@1@S@One of the largest sources of complexity in morphology is that this one-to-one correspondence between meaning and form scarcely applies to every case in the language.@@@@1@26@@danf@17-8-2009 10560790@unknown@formal@none@1@S@In English, we have word form pairs like ''ox/oxen'', ''goose/geese'', and ''sheep/sheep'', where the difference between the singular and the plural is signaled in a way that departs from the regular pattern, or is not signaled at all.@@@@1@38@@danf@17-8-2009 10560800@unknown@formal@none@1@S@Even cases considered "regular", with the final ''-s'', are not so simple; the ''-s'' in ''dogs'' is not pronounced the same way as the ''-s'' in ''cats'', and in a plural like ''dishes'', an "extra" vowel appears before the ''-s''.@@@@1@40@@danf@17-8-2009 10560810@unknown@formal@none@1@S@These cases, where the same distinction is effected by alternative forms of a "word", are called '''[[allomorph]]y'''.@@@@1@17@@danf@17-8-2009 10560820@unknown@formal@none@1@S@Phonological rules constrain which sounds can appear next to each other in a language, and morphological rules, when applied blindly, would often violate phonological rules, by resulting in sound sequences that are prohibited in the language in question.@@@@1@38@@danf@17-8-2009 10560830@unknown@formal@none@1@S@For example, to form the plural of ''dish'' by simply appending an ''-s'' to the end of the word would result in the form *{{IPA|[dɪʃs]}}, which is not permitted by the [[phonotactics]] of English.@@@@1@34@@danf@17-8-2009 10560840@unknown@formal@none@1@S@In order to "rescue" the word, a vowel sound is inserted between the root and the plural marker, and {{IPA|[dɪʃəz]}} results.@@@@1@21@@danf@17-8-2009 10560850@unknown@formal@none@1@S@Similar rules apply to the pronunciation of the ''-s'' in ''dogs'' and ''cats'': it depends on the quality (voiced vs. unvoiced) of the final preceding [[phoneme]].@@@@1@26@@danf@17-8-2009 10560860@unknown@formal@none@1@S@=== Lexical morphology ===@@@@1@4@@danf@17-8-2009 10560870@unknown@formal@none@1@S@[[Lexical morphology]] is the branch of morphology that deals with the [[lexicon]], which, morphologically conceived, is the collection of [[lexeme]]s in a language.@@@@1@23@@danf@17-8-2009 10560880@unknown@formal@none@1@S@As such, it concerns itself primarily with word-formation: derivation and compounding.@@@@1@11@@danf@17-8-2009 10560890@unknown@formal@none@1@S@== Models of morphology ==@@@@1@5@@danf@17-8-2009 10560900@unknown@formal@none@1@S@There are three principal approaches to morphology, which each try to capture the distinctions above in different ways.@@@@1@18@@danf@17-8-2009 10560910@unknown@formal@none@1@S@These are,@@@@1@2@@danf@17-8-2009 10560920@unknown@formal@none@1@S@* [[Morpheme-based morphology]], which makes use of an [[Item-and-Arrangment (Morphology)|Item-and-Arrangement]] approach.@@@@1@11@@danf@17-8-2009 10560930@unknown@formal@none@1@S@* [[Lexeme-based morphology]], which normally makes use of an [[Item-and-Process (Morphology)|Item-and-Process]] approach.@@@@1@12@@danf@17-8-2009 10560940@unknown@formal@none@1@S@* [[Word-based morphology]], which normally makes use of a [[Word-and-paradigm morphology|Word-and-Paradigm]] approach.@@@@1@12@@danf@17-8-2009 10560950@unknown@formal@none@1@S@Note that while the associations indicated between the concepts in each item in that list is very strong, it is not absolute.@@@@1@22@@danf@17-8-2009 10560960@unknown@formal@none@1@S@=== Morpheme-based morphology ===@@@@1@4@@danf@17-8-2009 10560970@unknown@formal@none@1@S@In [[morpheme-based morphology]], word-forms are analyzed as arrangements of [[morpheme]]s.@@@@1@10@@danf@17-8-2009 10560980@unknown@formal@none@1@S@A '''morpheme''' is defined as the minimal meaningful unit of a language.@@@@1@12@@danf@17-8-2009 10560990@unknown@formal@none@1@S@In a word like ''independently'', we say that the morphemes are ''in-'', ''depend'', ''-ent'', and ''ly''; ''depend'' is the [[root (linguistics)|root]] and the other morphemes are, in this case, derivational affixes.@@@@1@31@@danf@17-8-2009 10561000@unknown@formal@none@1@S@In a word like ''dogs'', we say that ''dog'' is the root, and that ''-s'' is an inflectional morpheme.@@@@1@19@@danf@17-8-2009 10561010@unknown@formal@none@1@S@This way of analyzing word-forms as if they were made of morphemes put after each other like beads on a string, is called [[Item-and-Arrangment (Morphology)|Item-and-Arrangement]].@@@@1@25@@danf@17-8-2009 10561020@unknown@formal@none@1@S@The morpheme-based approach is the first one that beginners to morphology usually think of, and which laymen tend to find the most obvious.@@@@1@23@@danf@17-8-2009 10561030@unknown@formal@none@1@S@This is so to such an extent that very often beginners think that morphemes are an inevitable, fundamental notion of morphology, and many five-minute explanations of morphology are, in fact, five-minute explanations of morpheme-based morphology.@@@@1@35@@danf@17-8-2009 10561040@unknown@formal@none@1@S@This is, however, not so.@@@@1@5@@danf@17-8-2009 10561050@unknown@formal@none@1@S@The fundamental idea of morphology is that the words of a language are related to each other by different kinds of rules.@@@@1@22@@danf@17-8-2009 10561060@unknown@formal@none@1@S@Analyzing words as sequences of morphemes is a way of describing these relations, but is not the only way.@@@@1@19@@danf@17-8-2009 10561070@unknown@formal@none@1@S@In actual academic linguistics, morpheme-based morphology certainly has many adherents, but is by no means the dominant approach.@@@@1@18@@danf@17-8-2009 10561080@unknown@formal@none@1@S@=== Lexeme-based morphology ===@@@@1@4@@danf@17-8-2009 10561090@unknown@formal@none@1@S@[[Lexeme-based morphology]] is (usually) an [[Item-and-Process (Morphology)|Item-and-Process]] approach.@@@@1@8@@danf@17-8-2009 10561100@unknown@formal@none@1@S@Instead of analyzing a word-form as a set of morphemes arranged in sequence, a word-form is said to be the result of applying rules that ''alter'' a word-form or stem in order to produce a new one.@@@@1@37@@danf@17-8-2009 10561110@unknown@formal@none@1@S@An inflectional rule takes a stem, changes it as is required by the rule, and outputs a word-form; a derivational rule takes a stem, changes it as per its own requirements, and outputs a derived stem; a compounding rule takes word-forms, and similarly outputs a compound stem.@@@@1@47@@danf@17-8-2009 10561120@unknown@formal@none@1@S@=== Word-based morphology ===@@@@1@4@@danf@17-8-2009 10561130@unknown@formal@none@1@S@[[Word-based morphology]] is a (usually) [[Word-and-paradigm morphology|Word-and-paradigm]] approach.@@@@1@8@@danf@17-8-2009 10561140@unknown@formal@none@1@S@This theory takes paradigms as a central notion.@@@@1@8@@danf@17-8-2009 10561150@unknown@formal@none@1@S@Instead of stating rules to combine morphemes into word-forms, or to generate word-forms from stems, word-based morphology states generalizations that hold between the forms of inflectional paradigms.@@@@1@27@@danf@17-8-2009 10561160@unknown@formal@none@1@S@The major point behind this approach is that many such generalizations are hard to state with either of the other approaches.@@@@1@21@@danf@17-8-2009 10561170@unknown@formal@none@1@S@The examples are usually drawn from [[fusional language]]s, where a given "piece" of a word, which a morpheme-based theory would call an inflectional morpheme, corresponds to a combination of grammatical categories, for example, "third person plural."@@@@1@36@@danf@17-8-2009 10561180@unknown@formal@none@1@S@Morpheme-based theories usually have no problems with this situation, since one just says that a given morpheme has two categories.@@@@1@20@@danf@17-8-2009 10561190@unknown@formal@none@1@S@Item-and-Process theories, on the other hand, often break down in cases like these, because they all too often assume that there will be two separate rules here, one for third person, and the other for plural, but the distinction between them turns out to be artificial.@@@@1@46@@danf@17-8-2009 10561200@unknown@formal@none@1@S@Word-and-Paradigm approaches treat these as whole words that are related to each other by [[analogy|analogical]] rules.@@@@1@16@@danf@17-8-2009 10561210@unknown@formal@none@1@S@Words can be categorized based on the pattern they fit into.@@@@1@11@@danf@17-8-2009 10561220@unknown@formal@none@1@S@This applies both to existing words and to new ones.@@@@1@10@@danf@17-8-2009 10561230@unknown@formal@none@1@S@Application of a pattern different than the one that has been used historically can give rise to a new word, such as ''older'' replacing ''elder'' (where ''older'' follows the normal pattern of [[adjective|adjectival]] [[superlative]]s) and ''cows'' replacing ''kine'' (where ''cows'' fits the regular pattern of plural formation).@@@@1@47@@danf@17-8-2009 10561240@unknown@formal@none@1@S@While a Word-and-Paradigm approach can explain this easily, other approaches have difficulty with phenomena such as this.@@@@1@17@@danf@17-8-2009 10561250@unknown@formal@none@1@S@== Morphological typology ==@@@@1@4@@danf@17-8-2009 10561260@unknown@formal@none@1@S@In the 19th century, philologists devised a now classic classification of languages according to their morphology.@@@@1@16@@danf@17-8-2009 10561270@unknown@formal@none@1@S@According to this typology, some languages are [[isolating language|isolating]], and have little to no morphology; others are [[agglutinating language|agglutinative]], and their words tend to have lots of easily-separable morphemes; while others yet are inflectional or [[fusional language|fusional]], because their inflectional morphemes are said to be "fused" together.@@@@1@47@@danf@17-8-2009 10561280@unknown@formal@none@1@S@This leads to one bound morpheme conveying multiple pieces of information.@@@@1@11@@danf@17-8-2009 10561290@unknown@formal@none@1@S@The classic example of an isolating language is [[Chinese language|Chinese]]; the classic example of an agglutinative language is [[Turkish language|Turkish]]; both [[Latin language|Latin]] and [[Greek language|Greek]] are classic examples of fusional languages.@@@@1@32@@danf@17-8-2009 10561300@unknown@formal@none@1@S@Considering the variability of the world's languages, it becomes clear that this classification is not at all clear-cut, and many languages do not neatly fit any one of these types, and some fit in more than one.@@@@1@37@@danf@17-8-2009 10561310@unknown@formal@none@1@S@A continuum of complex morphology of language may be adapted when considering languages.@@@@1@13@@danf@17-8-2009 10561320@unknown@formal@none@1@S@The three models of morphology stem from attempts to analyze languages that more or less match different categories in this typology.@@@@1@21@@danf@17-8-2009 10561330@unknown@formal@none@1@S@The Item-and-Arrangement approach fits very naturally with agglutinative languages; while the Item-and-Process and Word-and-Paradigm approaches usually address fusional languages.@@@@1@19@@danf@17-8-2009 10561340@unknown@formal@none@1@S@The reader should also note that the classical typology also mostly applies to inflectional morphology.@@@@1@15@@danf@17-8-2009 10561350@unknown@formal@none@1@S@There is very little fusion going on with word-formation.@@@@1@9@@danf@17-8-2009 10561360@unknown@formal@none@1@S@Languages may be classified as synthetic or analytic in their word formation, depending on the preferred way of expressing notions that are not inflectional: either by using word-formation (synthetic), or by using syntactic phrases (analytic).@@@@1@35@@danf@17-8-2009 10570010@unknown@formal@none@1@S@
Named entity recognition
@@@@1@3@@danf@17-8-2009 10570020@unknown@formal@none@1@S@'''Named entity recognition''' (NER) (also known as '''entity identification (EI)''' and '''entity extraction''') is a subtask of [[information extraction]] that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.@@@@1@48@@danf@17-8-2009 10570030@unknown@formal@none@1@S@For example, a NER system producing [[Message Understanding Conference|MUC]]-style output might [[Metadata|tag]] the sentence,@@@@1@14@@danf@17-8-2009 10570040@unknown@formal@none@1@S@:''Jim bought 300 shares of Acme Corp. in 2006.''@@@@1@9@@danf@17-8-2009 10570050@unknown@formal@none@1@S@:''''''''Jim'''''' bought ''''''300'''''' shares of ''''''Acme Corp.'''''' in ''''''2006''''''''.@@@@1@13@@danf@17-8-2009 10570060@unknown@formal@none@1@S@NER systems have been created that use linguistic [[formal grammar|grammar]]-based techniques as well as [[statistical model]]s.@@@@1@16@@danf@17-8-2009 10570070@unknown@formal@none@1@S@Hand-crafted grammar-based systems typically obtain better results, but at the cost of months of work by experienced [[Linguistics|linguists]].@@@@1@18@@danf@17-8-2009 10570080@unknown@formal@none@1@S@Statistical NER systems typically require a large amount of manually [[annotation|annotated]] training data.@@@@1@13@@danf@17-8-2009 10570090@unknown@formal@none@1@S@Since about 1998, there has been a great deal of interest in entity identification in the [[molecular biology]], [[bioinformatics]], and medical [[natural language processing]] communities.@@@@1@25@@danf@17-8-2009 10570100@unknown@formal@none@1@S@The most common entity of interest in that domain has been names of genes and gene products.@@@@1@17@@danf@17-8-2009 10570110@unknown@formal@none@1@S@==Named entity types==@@@@1@3@@danf@17-8-2009 10570120@unknown@formal@none@1@S@In the expression ''named entity'', the word ''named'' restricts the task to those entities for which one or many [[rigid designator]]s, as defined by [[Saul Kripke|Kripke]], stands for the referent.@@@@1@30@@danf@17-8-2009 10570130@unknown@formal@none@1@S@For instance, the ''automotive company created by Henry Ford in 1903'' is referred to as ''Ford'' or ''Ford Motor Company''.@@@@1@20@@danf@17-8-2009 10570140@unknown@formal@none@1@S@Rigid designators include proper names as well as certain natural kind terms like biological species and substances.@@@@1@17@@danf@17-8-2009 10570150@unknown@formal@none@1@S@There is a general agreement to include [[temporal expressions]] and some numerical expressions such as money and measures in named entities.@@@@1@21@@danf@17-8-2009 10570160@unknown@formal@none@1@S@While some instances of these types are good examples of rigid designators (e.g., the year 2001) there are also many invalid ones (e.g., I take my vacations in “June”).@@@@1@29@@danf@17-8-2009 10570170@unknown@formal@none@1@S@In the first case, the year ''2001'' refers to the ''2001st year of the Gregorian calendar''.@@@@1@16@@danf@17-8-2009 10570180@unknown@formal@none@1@S@In the second case, the month ''June'' may refer to the month of an undefined year (''past June'', ''next June'', ''June 2020'', etc.).@@@@1@23@@danf@17-8-2009 10570190@unknown@formal@none@1@S@It is arguable that the named entity definition is loosened in such cases for practical reasons.@@@@1@16@@danf@17-8-2009 10570200@unknown@formal@none@1@S@At least two [[Hierarchy|hierarchies]] of named entity types have been proposed in the literature.@@@@1@14@@danf@17-8-2009 10570210@unknown@formal@none@1@S@[[BBN Technologies|BBN]] categories [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html], proposed in 2002, is used for [[Question Answering]] and consists of 29 types and 64 subtypes.@@@@1@20@@danf@17-8-2009 10570220@unknown@formal@none@1@S@Sekine's extended hierarchy [http://nlp.cs.nyu.edu/ene/], proposed in 2002, is made of 200 subtypes.@@@@1@12@@danf@17-8-2009 10570230@unknown@formal@none@1@S@==Evaluation==@@@@1@1@@danf@17-8-2009 10570240@unknown@formal@none@1@S@Benchmarking and evaluations have been performed in the ''[[Message Understanding Conference]]s'' (MUC) organized by [[DARPA]], ''International Conference on Language Resources and Evaluation (LREC)'', ''Computational Natural Language Learning ([[CoNLL]])'' workshops, ''Automatic Content Extraction'' (ACE) organized by [[NIST]], the ''[[Multilingual Entity Task Conference]]'' (MET), ''Information Retrieval and Extraction Exercise'' (IREX) and in ''HAREM'' (Portuguese language only).@@@@1@54@@danf@17-8-2009 10570250@unknown@formal@none@1@S@[http://aclweb.org/aclwiki/index.php?title=Named_Entity_Recognition_%28State_of_the_art%29 State-of-the-art systems] produce near-human performance.@@@@1@6@@danf@17-8-2009 10570260@unknown@formal@none@1@S@For instance, the best system entering [http://www.itl.nist.gov/iad/894.02/related_projects/muc/proceedings/muc_7_toc.html MUC-7] scored 93.39% of [[Information_retrieval#F-measure|f-measure]] while human annotators scored 97.60% and 96.95%.@@@@1@19@@danf@17-8-2009 10580010@unknown@formal@none@1@S@
Natural language
@@@@1@2@@danf@17-8-2009 10580020@unknown@formal@none@1@S@In the [[philosophy of language]], a '''natural language''' (or '''ordinary language''') is a [[language]] that is spoken, [[writing|written]], or [[Sign language|signed]] by [[animal]]s for general-purpose communication, as distinguished from [[formal language]]s (such as [[Programming language|computer-programming languages]] or the "languages" used in the study of formal [[logic]], especially [[mathematical logic]]) and from [[constructed language]]s.@@@@1@53@@danf@17-8-2009 10580030@unknown@formal@none@1@S@== Defining natural language ==@@@@1@5@@danf@17-8-2009 10580040@unknown@formal@none@1@S@Though the exact definition is debatable, natural language is often contrasted with artificial or [[constructed languages]] such as [[Esperanto]], [[Latino Sexione]], and [[Occidental language|Occidental]].@@@@1@24@@danf@17-8-2009 10580050@unknown@formal@none@1@S@Linguists have an incomplete understanding of all aspects of the rules underlying natural languages, and these rules are therefore objects of study.@@@@1@22@@danf@17-8-2009 10580060@unknown@formal@none@1@S@The understanding of natural languages reveals much about not only how language works (in terms of [[syntax]], [[semantics]], [[phonetics]], [[phonology]], etc), but also about how the human [[mind]] and the human [[brain]] process language.@@@@1@34@@danf@17-8-2009 10580070@unknown@formal@none@1@S@In linguistic terms, 'natural language' only applies to a language that has evolved naturally, and the study of natural language primarily involves native (first language) speakers.@@@@1@26@@danf@17-8-2009 10580080@unknown@formal@none@1@S@The theory of [[universal grammar]] proposes that all natural languages have certain underlying rules which constrain the structure of the specific grammar for any given language.@@@@1@26@@danf@17-8-2009 10580090@unknown@formal@none@1@S@While [[grammarians]], writers of dictionaries, and language policy-makers all have a certain influence on the evolution of language, their ability to influence what people think they 'ought' to say is distinct from what people actually say.@@@@1@36@@danf@17-8-2009 10580100@unknown@formal@none@1@S@Natural language applies to the latter, and is thus a 'descriptive' rather than a 'prescriptive' term.@@@@1@16@@danf@17-8-2009 10580110@unknown@formal@none@1@S@Thus non-standard language varieties (such as [[African American Vernacular English]]) are considered to be natural while standard language varieties (such as [[Standard American English]]) which are more 'prescripted' can be considered to be at least somewhat artificial or constructed.@@@@1@39@@danf@17-8-2009 10580120@unknown@formal@none@1@S@== Native language learning ==@@@@1@5@@danf@17-8-2009 10580130@unknown@formal@none@1@S@The [[learning]] of one's own [[native language]], typically that of one's [[parent]]s, normally occurs spontaneously in early human [[childhood]] and is [[Biology|biologically]] driven.@@@@1@23@@danf@17-8-2009 10580140@unknown@formal@none@1@S@A crucial role of this process is performed by the [[Nervous system|neural]] activity of a portion of the human [[brain]] known as [[Broca's area]].@@@@1@24@@danf@17-8-2009 10580150@unknown@formal@none@1@S@There are approximately 7,000 current human languages, and many, if not most seem to share certain properties, leading to the belief in the existence of [[Universal Grammar]], as shown by [[generative grammar]] studies pioneered by the work of [[Noam Chomsky]].@@@@1@40@@danf@17-8-2009 10580160@unknown@formal@none@1@S@Recently, it has been demonstrated that a dedicated network in the human brain (crucially involving [[Broca's area]], a portion of the left inferior frontal gyrus), is selectively activated by complex verbal structures (but not simple ones) of those languages that meet the Universal Grammar requirements.@@@@1@45@@danf@17-8-2009 10580170@unknown@formal@none@1@S@== Origins of natural language ==@@@@1@6@@danf@17-8-2009 10580180@unknown@formal@none@1@S@There is disagreement among anthropologists on when language was first used by humans (or their ancestors).@@@@1@16@@danf@17-8-2009 10580190@unknown@formal@none@1@S@Estimates range from about two million (2,000,000) years ago, during the time of ''[[Homo habilis]]'', to as recently as forty thousand (40,000) years ago, during the time of [[Cro-Magnon]] man.@@@@1@30@@danf@17-8-2009 10580200@unknown@formal@none@1@S@However recent evidence suggests modern human language was invented or evolved in Africa prior to the dispersal of humans from Africa around 50,000 years ago.@@@@1@25@@danf@17-8-2009 10580210@unknown@formal@none@1@S@Since all people including the most isolated indigenous groups such as the [[Andamanese]] or the [[Tasmanian aboriginals]] possess language, then it must have been present in the ancestral populations in Africa before the human population split into various groups to colonize the rest of the world.@@@@1@46@@danf@17-8-2009 10580220@unknown@formal@none@1@S@Some claim that all nautural languages came out of one single language, known as [[Adamic]].@@@@1@15@@danf@17-8-2009 10580230@unknown@formal@none@1@S@== Linguistic diversity ==@@@@1@4@@danf@17-8-2009 10580240@unknown@formal@none@1@S@As of early 2007, there are 6,912 known living human languages.@@@@1@11@@danf@17-8-2009 10580250@unknown@formal@none@1@S@A "living language" is simply one which is in wide use by a specific group of living people.@@@@1@18@@danf@17-8-2009 10580260@unknown@formal@none@1@S@The exact number of known living languages will vary from 5,000 to 10,000, depending generally on the precision of one's definition of "language", and in particular on how one classifies [[dialects]].@@@@1@31@@danf@17-8-2009 10580270@unknown@formal@none@1@S@There are also many dead or [[extinct language]]s.@@@@1@8@@danf@17-8-2009 10580280@unknown@formal@none@1@S@There is no [[dialect#.22Dialect.22 or .22language.22|clear distinction]] between a language and a [[dialect]], notwithstanding linguist [[Max Weinreich]]'s famous [[aphorism]] that "[[a language is a dialect with an army and navy]]."@@@@1@30@@danf@17-8-2009 10580290@unknown@formal@none@1@S@In other words, the distinction may hinge on political considerations as much as on cultural differences, distinctive [[writing system]]s, or degree of [[mutual intelligibility]].@@@@1@24@@danf@17-8-2009 10580300@unknown@formal@none@1@S@It is probably impossible to accurately enumerate the living languages because our worldwide knowledge is incomplete, and it is a "moving target", as explained in greater detail by the [[Ethnologue]]'s Introduction, p. 7 - 8.@@@@1@35@@danf@17-8-2009 10580310@unknown@formal@none@1@S@With the 15th edition, the 103 newly added languages are not new but reclassified due to refinements in the definition of language.@@@@1@22@@danf@17-8-2009 10580320@unknown@formal@none@1@S@Although widely considered an [[encyclopedia]], the [[Ethnologue]] actually presents itself as an incomplete catalog, including only named languages that its editors are able to document.@@@@1@25@@danf@17-8-2009 10580330@unknown@formal@none@1@S@With each edition, the number of catalogued languages has grown.@@@@1@10@@danf@17-8-2009 10580340@unknown@formal@none@1@S@Beginning with the 14th edition (2000), an attempt was made to include all known living languages.@@@@1@16@@danf@17-8-2009 10580350@unknown@formal@none@1@S@SIL used an internal 3-letter code fashioned after [[airport code]]s to identify languages.@@@@1@13@@danf@17-8-2009 10580360@unknown@formal@none@1@S@This was the precursor to the modern [[ISO 639-3]] standard, to which SIL contributed.@@@@1@14@@danf@17-8-2009 10580370@unknown@formal@none@1@S@The standard allows for over 14,000 languages.@@@@1@7@@danf@17-8-2009 10580380@unknown@formal@none@1@S@In turn, the 15th edition was revised to conform to the pending ISO 639-3 standard.@@@@1@15@@danf@17-8-2009 10580390@unknown@formal@none@1@S@Of the catalogued languages, 497 have been flagged as "nearly extinct" due to trends in their usage.@@@@1@17@@danf@17-8-2009 10580400@unknown@formal@none@1@S@Per the 15th edition, 6,912 living languages are shared by over 5.7 billion speakers. (p. 15)@@@@1@16@@danf@17-8-2009 10580410@unknown@formal@none@1@S@== Taxonomy ==@@@@1@3@@danf@17-8-2009 10580420@unknown@formal@none@1@S@The [[Taxonomic classification|classification]] of natural languages can be performed on the basis of different underlying principles (different closeness notions, respecting different properties and relations between languages); important directions of present classifications are:@@@@1@32@@danf@17-8-2009 10580430@unknown@formal@none@1@S@* paying attention to the historical evolution of languages results in a genetic classification of languages—which is based on genetic relatedness of languages,@@@@1@23@@danf@17-8-2009 10580440@unknown@formal@none@1@S@* paying attention to the internal structure of languages ([[grammar]]) results in a typological classification of languages—which is based on similarity of one or more components of the language's grammar across languages,@@@@1@32@@danf@17-8-2009 10580450@unknown@formal@none@1@S@* and respecting geographical closeness and contacts between language-speaking communities results in areal groupings of languages.@@@@1@16@@danf@17-8-2009 10580460@unknown@formal@none@1@S@The different classifications do not match each other and are not expected to, but the correlation between them is an important point for many [[linguistics|linguistic]] research works.@@@@1@27@@danf@17-8-2009 10580470@unknown@formal@none@1@S@(There is a parallel to the classification of [[species]] in biological [[phylogenetics]] here: consider [[monophyletic]] vs. [[polyphyletic]] groups of species.)@@@@1@20@@danf@17-8-2009 10580480@unknown@formal@none@1@S@The task of genetic classification belongs to the field of [[historical-comparative linguistics]], of typological—to [[linguistic typology]].@@@@1@16@@danf@17-8-2009 10580490@unknown@formal@none@1@S@See also [[Taxonomy]], and [[Taxonomic classification]] for the general idea of classification and taxonomies.@@@@1@14@@danf@17-8-2009 10580500@unknown@formal@none@1@S@==== Genetic classification ====@@@@1@4@@danf@17-8-2009 10580510@unknown@formal@none@1@S@The world's languages have been grouped into families of languages that are believed to have common ancestors.@@@@1@17@@danf@17-8-2009 10580520@unknown@formal@none@1@S@Some of the major families are the [[Indo-European languages]], the [[Afro-Asiatic languages]], the [[Austronesian languages]], and the [[Sino-Tibetan languages]].@@@@1@19@@danf@17-8-2009 10580530@unknown@formal@none@1@S@The shared features of languages from one family can be due to shared ancestry.@@@@1@14@@danf@17-8-2009 10580540@unknown@formal@none@1@S@(Compare with [[homology (biology)|homology]] in biology.)@@@@1@6@@danf@17-8-2009 10580550@unknown@formal@none@1@S@==== Typological classification ====@@@@1@4@@danf@17-8-2009 10580560@unknown@formal@none@1@S@An example of a typological classification is the classification of languages on the basis of the basic order of the [[verb]], the [[subject (grammar)|subject]] and the [[object (grammar)|object]] in a [[sentence (linguistics)|sentence]] into several types: [[SVO language|SVO]], [[SOV language|SOV]], [[VSO language|VSO]], and so on, languages.@@@@1@45@@danf@17-8-2009 10580570@unknown@formal@none@1@S@([[English language|English]], for instance, belongs to the [[SVO language]] type.)@@@@1@10@@danf@17-8-2009 10580580@unknown@formal@none@1@S@The shared features of languages of one type (= from one typological class) may have arisen completely independently.@@@@1@18@@danf@17-8-2009 10580590@unknown@formal@none@1@S@(Compare with [[analogy (biology)|analogy]] in biology.)@@@@1@6@@danf@17-8-2009 10580595@unknown@formal@none@1@S@Their cooccurence might be due to the universal laws governing the structure of natural languages—[[language universal]]s.@@@@1@16@@danf@17-8-2009 10580600@unknown@formal@none@1@S@==== Areal classification ====@@@@1@4@@danf@17-8-2009 10580610@unknown@formal@none@1@S@The following language groupings can serve as some linguistically significant examples of areal linguistic units, or ''[[sprachbund]]s'': [[Balkan linguistic union]], or the bigger group of [[European languages]]; [[Caucasian languages]]; [[East Asian languages]].@@@@1@32@@danf@17-8-2009 10580620@unknown@formal@none@1@S@Although the members of each group are not closely [[genetic relatedness of languages|genetically related]], there is a reason for them to share similar features, namely: their speakers have been in contact for a long time within a common community and the languages ''converged'' in the course of the history.@@@@1@49@@danf@17-8-2009 10580630@unknown@formal@none@1@S@These are called "[[areal feature (linguistics)|areal feature]]s".@@@@1@7@@danf@17-8-2009 10580640@unknown@formal@none@1@S@One should be careful about the underlying classification principle for groups of languages which have apparently a geographical name: besides areal linguistic units, the [[taxa]] of the genetic classification ([[language family|language families]]) are often given names which themselves or parts of which refer to geographical areas.@@@@1@46@@danf@17-8-2009 10580650@unknown@formal@none@1@S@== Controlled languages ==@@@@1@4@@danf@17-8-2009 10580660@unknown@formal@none@1@S@Controlled natural languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity.@@@@1@25@@danf@17-8-2009 10580670@unknown@formal@none@1@S@The purpose behind the development and implementation of a controlled natural language typically is to aid non-native speakers of a natural language in understanding it, or to ease computer processing of a natural language.@@@@1@34@@danf@17-8-2009 10580680@unknown@formal@none@1@S@An example of a widely used controlled natural language is [[Simplified English]], which was originally developed for [[aerospace]] industry maintenance manuals.@@@@1@21@@danf@17-8-2009 10580690@unknown@formal@none@1@S@== Constructed languages and international auxiliary languages ==@@@@1@8@@danf@17-8-2009 10580700@unknown@formal@none@1@S@Constructed [[international auxiliary language]]s such as [[Esperanto]] and [[Interlingua]] that have [[native speaker]]s are by some also considered natural languages.@@@@1@20@@danf@17-8-2009 10580710@unknown@formal@none@1@S@However, constructed languages, while they are clearly languages, are not generally considered natural languages.@@@@1@14@@danf@17-8-2009 10580720@unknown@formal@none@1@S@The problem is that other languages have been used to communicate and evolve in a natural way, while Esperanto has been selectively designed by [[L.L. Zamenhof]] from natural languages, not grown from the natural fluctuations in vocabulary and syntax.@@@@1@39@@danf@17-8-2009 10580730@unknown@formal@none@1@S@Nor has Esperanto been naturally "standardized" by children's natural tendency to correct for illogical grammar structures in their parents' language, which can be seen in the development of [[pidgin]] languages into [[creole language]]s (as explained by Steven Pinker in [[The Language Instinct]]).@@@@1@42@@danf@17-8-2009 10580740@unknown@formal@none@1@S@The possible exception to this are true native speakers of such languages.@@@@1@12@@danf@17-8-2009 10580750@unknown@formal@none@1@S@More substantive basis for this designation is that the vocabulary, grammar, and orthography of Interlingua are natural; they have been standardized and presented by a [[International Auxiliary Language Association|linguistic research body]], but they predated it and are not themselves considered a product of human invention.@@@@1@45@@danf@17-8-2009 10580760@unknown@formal@none@1@S@Most experts, however, consider Interlingua to be naturalistic rather than natural.@@@@1@11@@danf@17-8-2009 10580770@unknown@formal@none@1@S@[[Latino Sine Flexione]], a second naturalistic auxiliary language, is also naturalistic in content but is no longer widely spoken.@@@@1@19@@danf@17-8-2009 10580780@unknown@formal@none@1@S@==Natural Language Processing==@@@@1@3@@danf@17-8-2009 10580790@unknown@formal@none@1@S@Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics.@@@@1@13@@danf@17-8-2009 10580800@unknown@formal@none@1@S@It studies the problems of automated generation and understanding of natural human languages.@@@@1@13@@danf@17-8-2009 10580810@unknown@formal@none@1@S@Natural-language-generation systems convert information from computer databases into normal-sounding human language.@@@@1@11@@danf@17-8-2009 10580820@unknown@formal@none@1@S@Natural-language-understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.@@@@1@19@@danf@17-8-2009 10580830@unknown@formal@none@1@S@== Modalities ==@@@@1@3@@danf@17-8-2009 10580840@unknown@formal@none@1@S@Natural language manifests itself in modalities other than speech.@@@@1@9@@danf@17-8-2009 10580850@unknown@formal@none@1@S@=== Sign languages ===@@@@1@4@@danf@17-8-2009 10580860@unknown@formal@none@1@S@In linguistic terms, sign languages are as rich and complex as any oral language, despite the previously common misconception that they are not "real languages".@@@@1@25@@danf@17-8-2009 10580870@unknown@formal@none@1@S@Professional linguists have studied many sign languages and found them to have every linguistic component required to be classed as true natural languages.@@@@1@23@@danf@17-8-2009 10580880@unknown@formal@none@1@S@Sign languages are not [[pantomime]], much as most spoken language is not [[onomatopoeic]].@@@@1@13@@danf@17-8-2009 10580890@unknown@formal@none@1@S@The signs do tend to exploit iconicity (visual connections with their referents) more than what is common in spoken language, but they are above all conventional and hence generally incomprehensible to non-speakers, just like spoken words and morphemes.@@@@1@38@@danf@17-8-2009 10580900@unknown@formal@none@1@S@They are not a visual rendition of an oral language either.@@@@1@11@@danf@17-8-2009 10580910@unknown@formal@none@1@S@They have complex grammars of their own, and can be used to discuss any topic, from the simple and concrete to the lofty and abstract.@@@@1@25@@danf@17-8-2009 10580920@unknown@formal@none@1@S@=== Written languages ===@@@@1@4@@danf@17-8-2009 10580930@unknown@formal@none@1@S@In a sense, written language should be distinguished from natural language.@@@@1@11@@danf@17-8-2009 10580940@unknown@formal@none@1@S@Until recently in the developed world, it was common for many people to be fluent in [[spoken language|spoken]] or [[sign language|signed languages]] and yet remain illiterate; this is still the case in poor countries today.@@@@1@35@@danf@17-8-2009 10580950@unknown@formal@none@1@S@Furthermore, natural [[language acquisition]] during childhood is largely spontaneous, while [[literacy]] must usually be intentionally acquired.@@@@1@16@@danf@17-8-2009 10590010@unknown@formal@none@1@S@
Natural language processing
@@@@1@3@@danf@17-8-2009 10590020@unknown@formal@none@1@S@'''Natural language processing''' ('''NLP''') is a subfield of [[artificial intelligence]] and [[computational linguistics]].@@@@1@13@@danf@17-8-2009 10590030@unknown@formal@none@1@S@It studies the problems of automated generation and understanding of [[natural language|natural human languages]].@@@@1@14@@danf@17-8-2009 10590040@unknown@formal@none@1@S@Natural-language-generation systems convert information from computer databases into normal-sounding human language.@@@@1@11@@danf@17-8-2009 10590050@unknown@formal@none@1@S@Natural-language-understanding systems convert samples of human language into more formal representations that are easier for [[computer]] programs to manipulate.@@@@1@19@@danf@17-8-2009 10590060@unknown@formal@none@1@S@==Tasks and limitations==@@@@1@3@@danf@17-8-2009 10590070@unknown@formal@none@1@S@In theory, natural-language processing is a very attractive method of [[human-computer interaction]].@@@@1@12@@danf@17-8-2009 10590080@unknown@formal@none@1@S@Early systems such as [[SHRDLU]], working in restricted "[[blocks world]]s" with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world [[ambiguity]] and [[complexity]].@@@@1@39@@danf@17-8-2009 10590090@unknown@formal@none@1@S@Natural-language understanding is sometimes referred to as an [[AI-complete]] problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it.@@@@1@28@@danf@17-8-2009 10590100@unknown@formal@none@1@S@The definition of "[[understanding]]" is one of the major problems in natural-language processing.@@@@1@13@@danf@17-8-2009 10590110@unknown@formal@none@1@S@==Concrete problems==@@@@1@2@@danf@17-8-2009 10590120@unknown@formal@none@1@S@Some examples of the problems faced by natural-language-understanding systems:@@@@1@9@@danf@17-8-2009 10590130@unknown@formal@none@1@S@* The sentences ''We gave the monkeys the bananas because they were hungry'' and ''We gave the monkeys the bananas because they were over-ripe'' have the same surface grammatical structure.@@@@1@30@@danf@17-8-2009 10590140@unknown@formal@none@1@S@However, the pronoun ''they'' refers to ''monkeys'' in one sentence and ''bananas'' in the other, and it is impossible to tell which without a knowledge of the properties of monkeys and bananas.@@@@1@32@@danf@17-8-2009 10590150@unknown@formal@none@1@S@* A string of words may be interpreted in different ways.@@@@1@11@@danf@17-8-2009 10590160@unknown@formal@none@1@S@For example, the string ''Time flies like an arrow'' may be interpreted in a variety of ways:@@@@1@17@@danf@17-8-2009 10590170@unknown@formal@none@1@S@**The common [[simile]]: ''[[time]]'' moves quickly just like an arrow does;@@@@1@11@@danf@17-8-2009 10590180@unknown@formal@none@1@S@**measure the speed of flies like you would measure that of an arrow (thus interpreted as an imperative) - i.e. ''(You should) time flies as you would (time) an arrow.'';@@@@1@30@@danf@17-8-2009 10590190@unknown@formal@none@1@S@**measure the speed of flies like an arrow would - i.e. ''Time flies in the same way that an arrow would (time them).'';@@@@1@23@@danf@17-8-2009 10590200@unknown@formal@none@1@S@**measure the speed of flies that are like arrows - i.e. ''Time those flies that are like arrows'';@@@@1@18@@danf@17-8-2009 10590210@unknown@formal@none@1@S@**all of a type of flying insect, "time-flies," collectively enjoys a single arrow (compare ''Fruit flies like a banana'');@@@@1@19@@danf@17-8-2009 10590220@unknown@formal@none@1@S@**each of a type of flying insect, "time-flies," individually enjoys a different arrow (similar comparison applies);@@@@1@16@@danf@17-8-2009 10590230@unknown@formal@none@1@S@**A concrete object, for example the magazine, ''[[Time (magazine)|Time]]'', travels through the air in an arrow-like manner.@@@@1@17@@danf@17-8-2009 10590240@unknown@formal@none@1@S@English is particularly challenging in this regard because it has little [[inflectional morphology]] to distinguish between [[parts of speech]].@@@@1@19@@danf@17-8-2009 10590250@unknown@formal@none@1@S@* English and several other languages don't specify which word an adjective applies to.@@@@1@14@@danf@17-8-2009 10590260@unknown@formal@none@1@S@For example, in the string "pretty little girls' school".@@@@1@9@@danf@17-8-2009 10590270@unknown@formal@none@1@S@** Does the school look little?@@@@1@6@@danf@17-8-2009 10590280@unknown@formal@none@1@S@** Do the girls look little?@@@@1@6@@danf@17-8-2009 10590290@unknown@formal@none@1@S@** Do the girls look pretty?@@@@1@6@@danf@17-8-2009 10590300@unknown@formal@none@1@S@** Does the school look pretty?@@@@1@6@@danf@17-8-2009 10590310@unknown@formal@none@1@S@* We will often imply additional information in spoken language by the way we place stress on words.@@@@1@18@@danf@17-8-2009 10590320@unknown@formal@none@1@S@The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it.@@@@1@32@@danf@17-8-2009 10590330@unknown@formal@none@1@S@Depending on which word the speaker places the stress, this sentence could have several distinct meanings:@@@@1@16@@danf@17-8-2009 10590340@unknown@formal@none@1@S@** "'''I''' never said she stole my money" - Someone else said it, but ''I'' didn't.@@@@1@16@@danf@17-8-2009 10590350@unknown@formal@none@1@S@** "I '''never''' said she stole my money" - I simply didn't ever say it.@@@@1@15@@danf@17-8-2009 10590360@unknown@formal@none@1@S@** "I never '''said''' she stole my money" - I might have implied it in some way, but I never explicitly said it.@@@@1@23@@danf@17-8-2009 10590370@unknown@formal@none@1@S@** "I never said '''she''' stole my money" - I said someone took it; I didn't say it was she.@@@@1@20@@danf@17-8-2009 10590380@unknown@formal@none@1@S@** "I never said she '''stole''' my money" - I just said she probably borrowed it.@@@@1@16@@danf@17-8-2009 10590390@unknown@formal@none@1@S@** "I never said she stole '''my''' money" - I said she stole someone else's money.@@@@1@16@@danf@17-8-2009 10590400@unknown@formal@none@1@S@** "I never said she stole my '''money'''" - I said she stole something, but not my money.@@@@1@18@@danf@17-8-2009 10590410@unknown@formal@none@1@S@==Subproblems==@@@@1@1@@danf@17-8-2009 10590420@unknown@formal@none@1@S@; [[Speech segmentation]]:@@@@1@3@@danf@17-8-2009 10590430@unknown@formal@none@1@S@In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the analog signal to discrete characters can be a very difficult process.@@@@1@29@@danf@17-8-2009 10590440@unknown@formal@none@1@S@Also, in [[natural speech]] there are hardly any pauses between successive words; the location of those boundaries usually must take into account [[grammatical]] and [[semantic]] constraints, as well as the [[context]].@@@@1@31@@danf@17-8-2009 10590450@unknown@formal@none@1@S@; [[Text segmentation]]:@@@@1@3@@danf@17-8-2009 10590460@unknown@formal@none@1@S@Some written languages like [[Chinese language|Chinese]], [[Japanese language|Japanese]] and [[Thai language|Thai]] do not have single-word boundaries either, so any significant text [[parsing]] usually requires the identification of word boundaries, which is often a non-trivial task.@@@@1@35@@danf@17-8-2009 10590470@unknown@formal@none@1@S@; [[Word sense disambiguation]]:@@@@1@4@@danf@17-8-2009 10590480@unknown@formal@none@1@S@Many words have more than one [[meaning]]; we have to select the meaning which makes the most sense in context.@@@@1@20@@danf@17-8-2009 10590490@unknown@formal@none@1@S@; [[Syntactic ambiguity]]:@@@@1@3@@danf@17-8-2009 10590500@unknown@formal@none@1@S@The [[grammar]] for [[natural language]]s is [[ambiguous]], i.e. there are often multiple possible [[parse tree]]s for a given sentence.@@@@1@19@@danf@17-8-2009 10590510@unknown@formal@none@1@S@Choosing the most appropriate one usually requires [[semantics|semantic]] and contextual information.@@@@1@11@@danf@17-8-2009 10590520@unknown@formal@none@1@S@Specific problem components of syntactic ambiguity include [[sentence boundary disambiguation]].@@@@1@10@@danf@17-8-2009 10590530@unknown@formal@none@1@S@; Imperfect or irregular input :@@@@1@6@@danf@17-8-2009 10590540@unknown@formal@none@1@S@Foreign or regional accents and vocal impediments in speech; typing or grammatical errors, [[Optical character recognition|OCR]] errors in texts.@@@@1@19@@danf@17-8-2009 10590550@unknown@formal@none@1@S@; [[Speech acts]] and plans:@@@@1@5@@danf@17-8-2009 10590560@unknown@formal@none@1@S@A sentence can often be considered an action by the speaker.@@@@1@11@@danf@17-8-2009 10590570@unknown@formal@none@1@S@The sentence structure, alone, may not contain enough information to define this action.@@@@1@13@@danf@17-8-2009 10590580@unknown@formal@none@1@S@For instance, a question is actually the speaker requesting some sort of response from the listener.@@@@1@16@@danf@17-8-2009 10590590@unknown@formal@none@1@S@The desired response may be verbal, physical, or some combination.@@@@1@10@@danf@17-8-2009 10590600@unknown@formal@none@1@S@For example, "Can you pass the class?" is a request for a simple yes-or-no answer, while "Can you pass the salt?" is requesting a physical action to be performed.@@@@1@29@@danf@17-8-2009 10590610@unknown@formal@none@1@S@It is not appropriate to respond with "Yes, I can pass the salt," without the accompanying action (although "No" or "I can't reach the salt" would explain a lack of action).@@@@1@31@@danf@17-8-2009 10590620@unknown@formal@none@1@S@== Statistical NLP ==@@@@1@4@@danf@17-8-2009 10590630@unknown@formal@none@1@S@Statistical natural-language processing uses [[stochastic]], [[probabilistic]] and [[statistical]] methods to resolve some of the difficulties discussed above, especially those which arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses.@@@@1@39@@danf@17-8-2009 10590640@unknown@formal@none@1@S@Methods for disambiguation often involve the use of [[corpus linguistics | corpora]] and [[Markov model]]s.@@@@1@15@@danf@17-8-2009 10590650@unknown@formal@none@1@S@Statistical NLP comprises all quantitative approaches to automated language processing, including probabilistic modeling, [[information theory]], and [[linear algebra]].@@@@1@18@@danf@17-8-2009 10590660@unknown@formal@none@1@S@The technology for statistical NLP comes mainly from [[machine learning]] and [[data mining]], both of which are fields of [[artificial intelligence]] that involve learning from data.@@@@1@26@@danf@17-8-2009 10590670@unknown@formal@none@1@S@==Major tasks in NLP==@@@@1@4@@danf@17-8-2009 10590680@unknown@formal@none@1@S@* [[Automatic summarization]]@@@@1@3@@danf@17-8-2009 10590690@unknown@formal@none@1@S@* [[Foreign language reading aid]]@@@@1@5@@danf@17-8-2009 10590700@unknown@formal@none@1@S@* [[Foreign language writing aid]]@@@@1@5@@danf@17-8-2009 10590710@unknown@formal@none@1@S@* [[Information extraction]]@@@@1@3@@danf@17-8-2009 10590720@unknown@formal@none@1@S@* [[Information retrieval]]@@@@1@3@@danf@17-8-2009 10590730@unknown@formal@none@1@S@* [[Machine translation]]@@@@1@3@@danf@17-8-2009 10590740@unknown@formal@none@1@S@* [[Named entity recognition]]@@@@1@4@@danf@17-8-2009 10590750@unknown@formal@none@1@S@* [[Natural language generation]]@@@@1@4@@danf@17-8-2009 10590760@unknown@formal@none@1@S@* [[Natural language understanding]]@@@@1@4@@danf@17-8-2009 10590770@unknown@formal@none@1@S@* [[Optical character recognition]]@@@@1@4@@danf@17-8-2009 10590780@unknown@formal@none@1@S@* [[Question answering]]@@@@1@3@@danf@17-8-2009 10590790@unknown@formal@none@1@S@* [[Speech recognition]]@@@@1@3@@danf@17-8-2009 10590800@unknown@formal@none@1@S@* [[Spoken dialogue system]]@@@@1@4@@danf@17-8-2009 10590810@unknown@formal@none@1@S@* [[Text simplification]]@@@@1@3@@danf@17-8-2009 10590820@unknown@formal@none@1@S@* [[Text to speech]]@@@@1@4@@danf@17-8-2009 10590830@unknown@formal@none@1@S@* [[Text-proofing]]@@@@1@2@@danf@17-8-2009 10590840@unknown@formal@none@1@S@== Evaluation of natural language processing ==@@@@1@7@@danf@17-8-2009 10590850@unknown@formal@none@1@S@===Objectives===@@@@1@1@@danf@17-8-2009 10590860@unknown@formal@none@1@S@The goal of NLP evaluation is to measure one or more ''qualities'' of an algorithm or a system, in order to determine if (or to what extent) the system answers the goals of its designers, or the needs of its users.@@@@1@41@@danf@17-8-2009 10590870@unknown@formal@none@1@S@Research in NLP evaluation has received considerable attention, because the definition of proper evaluation criteria is one way to specify precisely an NLP problem, going thus beyond the vagueness of tasks defined only as ''language understanding'' or ''language generation''.@@@@1@39@@danf@17-8-2009 10590880@unknown@formal@none@1@S@A precise set of evaluation criteria, which includes mainly evaluation data and evaluation metrics, enables several teams to compare their solutions to a given NLP problem.@@@@1@26@@danf@17-8-2009 10590890@unknown@formal@none@1@S@===Short history of evaluation in NLP===@@@@1@6@@danf@17-8-2009 10590900@unknown@formal@none@1@S@The first evaluation campaign on written texts seems to be a campaign dedicated to message understanding in 1987 (Pallet 1998).@@@@1@20@@danf@17-8-2009 10590910@unknown@formal@none@1@S@Then, the Parseval/GEIG project compared phrase-structure grammars (Black 1991).@@@@1@9@@danf@17-8-2009 10590920@unknown@formal@none@1@S@A series of campaigns within Tipster project were realized on tasks like summarization, translation and searching (Hirshman 1998).@@@@1@18@@danf@17-8-2009 10590930@unknown@formal@none@1@S@In 1994, in Germany, the Morpholympics compared German taggers.@@@@1@9@@danf@17-8-2009 10590940@unknown@formal@none@1@S@Then, the Senseval and Romanseval campaigns were conducted with the objectives of semantic disambiguation.@@@@1@14@@danf@17-8-2009 10590950@unknown@formal@none@1@S@In 1996, the Sparkle campaign compared syntactic parsers in four different languages (English, French, German and Italian).@@@@1@17@@danf@17-8-2009 10590960@unknown@formal@none@1@S@In France, the Grace project compared a set of 21 taggers for French in 1997 (Adda 1999).@@@@1@17@@danf@17-8-2009 10590970@unknown@formal@none@1@S@In 2004, during the [[Technolangue/Easy]] project, 13 parsers for French were compared.@@@@1@12@@danf@17-8-2009 10590980@unknown@formal@none@1@S@Large-scale evaluation of dependency parsers were performed in the context of the CoNLL shared tasks in 2006 and 2007.@@@@1@19@@danf@17-8-2009 10590990@unknown@formal@none@1@S@In Italy, the evalita campaign was conducted in 2007 to compare various tools for Italian [http://evalita.itc.it evalita web site].@@@@1@19@@danf@17-8-2009 10591000@unknown@formal@none@1@S@In France, within the ANR-Passage project (end of 2007), 10 parsers for French were compared [http://atoll.inria.fr/passage/ passage web site].@@@@1@19@@danf@17-8-2009 10591010@unknown@formal@none@1@S@Adda G., Mariani J., Paroubek P., Rajman M. 1999 L'action GRACE d'évaluation de l'assignation des parties du discours pour le français. Langues vol-2@@@@1@23@@danf@17-8-2009 10591030@unknown@formal@none@1@S@Black E., Abney S., Flickinger D., Gdaniec C., Grishman R., Harrison P., Hindle D., Ingria R., Jelinek F., Klavans J., Liberman M., Marcus M., Reukos S., Santoni B., Strzalkowski T. 1991 A procedure for quantitatively comparing the syntactic coverage of English grammars. DARPA Speech and Natural Language Workshop@@@@1@48@@danf@17-8-2009 10591050@unknown@formal@none@1@S@Hirshman L. 1998 Language understanding evaluation: lessons learned from MUC and ATIS. LREC Granada@@@@1@14@@danf@17-8-2009 10591070@unknown@formal@none@1@S@Pallet D.S. 1998 The NIST role in automatic speech recognition benchmark tests. LREC Granada@@@@1@14@@danf@17-8-2009 10591090@unknown@formal@none@1@S@===Different types of evaluation===@@@@1@4@@danf@17-8-2009 10591100@unknown@formal@none@1@S@Depending on the evaluation procedures, a number of distinctions are traditionally made in NLP evaluation.@@@@1@15@@danf@17-8-2009 10591110@unknown@formal@none@1@S@* Intrinsic vs. extrinsic evaluation@@@@1@5@@danf@17-8-2009 10591120@unknown@formal@none@1@S@Intrinsic evaluation considers an isolated NLP system and characterizes its performance mainly with respect to a ''gold standard'' result, pre-defined by the evaluators.@@@@1@23@@danf@17-8-2009 10591130@unknown@formal@none@1@S@Extrinsic evaluation, also called ''evaluation in use'' considers the NLP system in a more complex setting, either as an embedded system or serving a precise function for a human user.@@@@1@30@@danf@17-8-2009 10591140@unknown@formal@none@1@S@The extrinsic performance of the system is then characterized in terms of its utility with respect to the overall task of the complex system or the human user.@@@@1@28@@danf@17-8-2009 10591150@unknown@formal@none@1@S@* Black-box vs. glass-box evaluation@@@@1@5@@danf@17-8-2009 10591160@unknown@formal@none@1@S@Black-box evaluation requires one to run an NLP system on a given data set and to measure a number of parameters related to the quality of the process (speed, reliability, resource consumption) and, most importantly, to the quality of the result (e.g. the accuracy of data annotation or the fidelity of a translation).@@@@1@53@@danf@17-8-2009 10591170@unknown@formal@none@1@S@Glass-box evaluation looks at the design of the system, the algorithms that are implemented, the linguistic resources it uses (e.g. vocabulary size), etc.@@@@1@23@@danf@17-8-2009 10591180@unknown@formal@none@1@S@Given the complexity of NLP problems, it is often difficult to predict performance only on the basis of glass-box evaluation, but this type of evaluation is more informative with respect to error analysis or future developments of a system.@@@@1@39@@danf@17-8-2009 10591190@unknown@formal@none@1@S@* Automatic vs. manual evaluation@@@@1@5@@danf@17-8-2009 10591200@unknown@formal@none@1@S@In many cases, automatic procedures can be defined to evaluate an NLP system by comparing its output with the gold standard (or desired) one.@@@@1@24@@danf@17-8-2009 10591210@unknown@formal@none@1@S@Although the cost of producing the gold standard can be quite high, automatic evaluation can be repeated as often as needed without much additional costs (on the same input data).@@@@1@30@@danf@17-8-2009 10591220@unknown@formal@none@1@S@However, for many NLP problems, the definition of a gold standard is a complex task, and can prove impossible when inter-annotator agreement is insufficient.@@@@1@24@@danf@17-8-2009 10591230@unknown@formal@none@1@S@Manual evaluation is performed by human judges, which are instructed to estimate the quality of a system, or most often of a sample of its output, based on a number of criteria.@@@@1@32@@danf@17-8-2009 10591240@unknown@formal@none@1@S@Although, thanks to their linguistic competence, human judges can be considered as the reference for a number of language processing tasks, there is also considerable variation across their ratings.@@@@1@29@@danf@17-8-2009 10591250@unknown@formal@none@1@S@This is why automatic evaluation is sometimes referred to as ''objective'' evaluation, while the human kind appears to be more ''subjective.''@@@@1@21@@danf@17-8-2009 10591260@unknown@formal@none@1@S@=== Shared tasks (Campaigns)===@@@@1@4@@danf@17-8-2009 10591270@unknown@formal@none@1@S@* [[BioCreative]]@@@@1@2@@danf@17-8-2009 10591280@unknown@formal@none@1@S@* [[Message Understanding Conference]]@@@@1@4@@danf@17-8-2009 10591290@unknown@formal@none@1@S@* [[Technolangue/Easy]]@@@@1@2@@danf@17-8-2009 10591300@unknown@formal@none@1@S@* [[Text Retrieval Conference]]@@@@1@4@@danf@17-8-2009 10591310@unknown@formal@none@1@S@==Standardization in NLP==@@@@1@3@@danf@17-8-2009 10591320@unknown@formal@none@1@S@An ISO sub-committee is working in order to ease interoperability between [[Lexical resource]]s and NLP programs.@@@@1@16@@danf@17-8-2009 10591330@unknown@formal@none@1@S@The sub-committee is part of [[ISO/TC37]] and is called ISO/TC37/SC4.@@@@1@10@@danf@17-8-2009 10591340@unknown@formal@none@1@S@Some ISO standards are already published but most of them are under construction, mainly on lexicon representation (see [[lexical markup framework|LMF]]), annotation and data category registry.@@@@1@26@@danf@17-8-2009 10600010@unknown@formal@none@1@S@
Neural network
@@@@1@2@@danf@17-8-2009 10600020@unknown@formal@none@1@S@Traditionally, the term '''neural network''' had been used to refer to a network or circuit of [[neuron|biological neurons]].@@@@1@18@@danf@17-8-2009 10600030@unknown@formal@none@1@S@The modern usage of the term often refers to [[artificial neural network]]s, which are composed of [[artificial neuron]]s or nodes.@@@@1@20@@danf@17-8-2009 10600040@unknown@formal@none@1@S@Thus the term has two distinct usages:@@@@1@7@@danf@17-8-2009 10600050@unknown@formal@none@1@S@# [[Biological neural network]]s are made up of real biological neurons that are connected or functionally-related in the [[peripheral nervous system]] or the [[central nervous system]].@@@@1@26@@danf@17-8-2009 10600060@unknown@formal@none@1@S@In the field of [[neuroscience]], they are often identified as groups of neurons that perform a specific physiological function in laboratory analysis.@@@@1@22@@danf@17-8-2009 10600070@unknown@formal@none@1@S@# [[Artificial neural network]]s are made up of interconnecting artificial neurons (programming constructs that mimic the properties of biological neurons).@@@@1@20@@danf@17-8-2009 10600080@unknown@formal@none@1@S@Artificial neural networks may either be used to gain an understanding of biological neural networks, or for solving artificial intelligence problems without necessarily creating a model of a real biological system.@@@@1@31@@danf@17-8-2009 10600090@unknown@formal@none@1@S@This article focuses on the relationship between the two concepts; for detailed coverage of the two different concepts refer to the separate articles: [[Biological neural network]] and [[Artificial neural network]].@@@@1@30@@danf@17-8-2009 10600100@unknown@formal@none@1@S@==Characterization==@@@@1@1@@danf@17-8-2009 10600110@unknown@formal@none@1@S@In general a biological neural network is composed of a group or groups of chemically connected or functionally associated neurons.@@@@1@20@@danf@17-8-2009 10600120@unknown@formal@none@1@S@A single neuron may be connected to many other neurons and the total number of neurons and connections in a network may be extensive.@@@@1@24@@danf@17-8-2009 10600130@unknown@formal@none@1@S@Connections, called [[synapses]], are usually formed from [[axons]] to [[dendrites]], though dendrodendritic microcircuits and other connections are possible.@@@@1@18@@danf@17-8-2009 10600140@unknown@formal@none@1@S@Apart from the electrical signaling, there are other forms of signaling that arise from [[neurotransmitter]] diffusion, which have an effect on electrical signaling.@@@@1@23@@danf@17-8-2009 10600150@unknown@formal@none@1@S@As such, neural networks are extremely complex.@@@@1@7@@danf@17-8-2009 10600160@unknown@formal@none@1@S@[[Artificial intelligence]] and [[cognitive modeling]] try to simulate some properties of neural networks.@@@@1@13@@danf@17-8-2009 10600170@unknown@formal@none@1@S@While similar in their techniques, the former has the aim of solving particular tasks, while the latter aims to build mathematical models of biological neural systems.@@@@1@26@@danf@17-8-2009 10600180@unknown@formal@none@1@S@In the [[artificial intelligence]] field, artificial neural networks have been applied successfully to [[speech recognition]], [[image analysis]] and adaptive [[control]], in order to construct [[software agents]] (in [[Video game|computer and video games]]) or [[autonomous robot]]s.@@@@1@35@@danf@17-8-2009 10600190@unknown@formal@none@1@S@Most of the currently employed artificial neural networks for artificial intelligence are based on [[statistical estimation]], [[Optimization (mathematics)|optimization]] and [[control theory]].@@@@1@21@@danf@17-8-2009 10600200@unknown@formal@none@1@S@The [[cognitive modelling]] field involves the physical or mathematical modeling of the behaviour of neural systems; ranging from the individual neural level (e.g. modelling the spike response curves of neurons to a stimulus), through the neural cluster level (e.g. modelling the release and effects of dopamine in the basal ganglia) to the complete organism (e.g. behavioural modelling of the organism's response to stimuli).@@@@1@63@@danf@17-8-2009 10600210@unknown@formal@none@1@S@==The brain, neural networks and computers==@@@@1@6@@danf@17-8-2009 10600220@unknown@formal@none@1@S@Neural networks, as used in artificial intelligence, have traditionally been viewed as simplified models of neural processing in the brain, even though the relation between this model and brain biological architecture is debated.@@@@1@33@@danf@17-8-2009 10600230@unknown@formal@none@1@S@A subject of current research in theoretical neuroscience is the question surrounding the degree of complexity and the properties that individual neural elements should have to reproduce something resembling animal intelligence.@@@@1@31@@danf@17-8-2009 10600240@unknown@formal@none@1@S@Historically, computers evolved from the [[von Neumann architecture]], which is based on sequential processing and execution of explicit instructions.@@@@1@19@@danf@17-8-2009 10600250@unknown@formal@none@1@S@On the other hand, the origins of neural networks are based on efforts to model information processing in biological systems, which may rely largely on parallel processing as well as implicit instructions based on recognition of patterns of 'sensory' input from external sources.@@@@1@43@@danf@17-8-2009 10600260@unknown@formal@none@1@S@In other words, at its very heart a neural network is a complex statistical processor (as opposed to being tasked to sequentially process and execute).@@@@1@25@@danf@17-8-2009 10600270@unknown@formal@none@1@S@==Neural networks and artificial intelligence==@@@@1@5@@danf@17-8-2009 10600280@unknown@formal@none@1@S@An ''artificial neural network'' (ANN), also called a ''simulated neural network'' (SNN) or commonly just ''neural network'' (NN) is an interconnected group of [[artificial neuron]]s that uses a [[mathematical model|mathematical or computational model]] for [[information processing]] based on a [[connectionism|connectionistic]] approach to [[computation]].@@@@1@43@@danf@17-8-2009 10600290@unknown@formal@none@1@S@In most cases an ANN is an [[adaptive system]] that changes its structure based on external or internal information that flows through the network.@@@@1@24@@danf@17-8-2009 10600300@unknown@formal@none@1@S@In more practical terms neural networks are [[non-linear]] [[statistical]] [[data modeling]] or [[decision making]] tools.@@@@1@15@@danf@17-8-2009 10600310@unknown@formal@none@1@S@They can be used to model complex relationships between inputs and outputs or to [[Pattern recognition|find patterns]] in data.@@@@1@19@@danf@17-8-2009 10600320@unknown@formal@none@1@S@===Background===@@@@1@1@@danf@17-8-2009 10600330@unknown@formal@none@1@S@An [[artificial neural network]] involves a network of simple processing elements ([[artificial neurons]]) which can exhibit complex global behaviour, determined by the connections between the processing elements and element parameters.@@@@1@30@@danf@17-8-2009 10600340@unknown@formal@none@1@S@One classical type of artificial neural network is the [[Hopfield net]].@@@@1@11@@danf@17-8-2009 10600350@unknown@formal@none@1@S@In a neural network model simple [[Node (neural networks)|nodes]], which can be called variously "neurons", "neurodes", "Processing Elements" (PE) or "units", are connected together to form a network of nodes — hence the term "neural network".@@@@1@36@@danf@17-8-2009 10600360@unknown@formal@none@1@S@While a neural network does not have to be adaptive ''per se'', its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.@@@@1@36@@danf@17-8-2009 10600370@unknown@formal@none@1@S@In modern [[Neural network software|software implementations]] of artificial neural networks the approach inspired by biology has more or less been abandoned for a more practical approach based on statistics and signal processing.@@@@1@32@@danf@17-8-2009 10600380@unknown@formal@none@1@S@In some of these systems neural networks, or parts of neural networks (such as [[artificial neuron]]s) are used as components in larger systems that combine both adaptive and non-adaptive elements.@@@@1@30@@danf@17-8-2009 10600390@unknown@formal@none@1@S@The concept of a neural network appears to have first been proposed by [[Alan Turing]] in his 1948 paper "Intelligent Machinery".@@@@1@21@@danf@17-8-2009 10600400@unknown@formal@none@1@S@===Applications===@@@@1@1@@danf@17-8-2009 10600410@unknown@formal@none@1@S@The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations and also to use it.@@@@1@27@@danf@17-8-2009 10600420@unknown@formal@none@1@S@This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.@@@@1@24@@danf@17-8-2009 10600430@unknown@formal@none@1@S@====Real life applications====@@@@1@3@@danf@17-8-2009 10600440@unknown@formal@none@1@S@The tasks to which artificial neural networks are applied tend to fall within the following broad categories:@@@@1@17@@danf@17-8-2009 10600450@unknown@formal@none@1@S@*[[Function approximation]], or [[regression analysis]], including [[time series prediction]] and modelling.@@@@1@11@@danf@17-8-2009 10600460@unknown@formal@none@1@S@*[[Statistical classification|Classification]], including [[Pattern recognition|pattern]] and sequence recognition, novelty detection and sequential decision making.@@@@1@14@@danf@17-8-2009 10600470@unknown@formal@none@1@S@*[[Data processing]], including filtering, clustering, [[blind signal separation]] and compression.@@@@1@10@@danf@17-8-2009 10600480@unknown@formal@none@1@S@Application areas include system identification and control (vehicle control, process control), game-playing and decision making (backgammon, chess, racing), pattern recognition (radar systems, face identification, object recognition, etc.), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial applications, [[data mining]] (or knowledge discovery in databases, "KDD"), visualization and [[e-mail spam]] filtering.@@@@1@51@@danf@17-8-2009 10600490@unknown@formal@none@1@S@===Neural network software===@@@@1@3@@danf@17-8-2009 10600500@unknown@formal@none@1@S@''Main article:'' [[Neural network software]]@@@@1@5@@danf@17-8-2009 10600510@unknown@formal@none@1@S@'''Neural network software''' is used to [[Simulation|simulate]], [[research]], [[Software development|develop]] and apply [[artificial neural network]]s, [[biological neural network]]s and in some cases a wider array of [[adaptive system]]s.@@@@1@28@@danf@17-8-2009 10600520@unknown@formal@none@1@S@====Learning paradigms====@@@@1@2@@danf@17-8-2009 10600530@unknown@formal@none@1@S@There are three major learning paradigms, each corresponding to a particular abstract learning task.@@@@1@14@@danf@17-8-2009 10600540@unknown@formal@none@1@S@These are [[supervised learning]], [[unsupervised learning]] and [[reinforcement learning]].@@@@1@9@@danf@17-8-2009 10600550@unknown@formal@none@1@S@Usually any given type of network architecture can be employed in any of those tasks.@@@@1@15@@danf@17-8-2009 10600560@unknown@formal@none@1@S@;Supervised learning@@@@1@2@@danf@17-8-2009 10600570@unknown@formal@none@1@S@In [[supervised learning]], we are given a set of example pairs (x, y), x \\in X, y \\in Y and the aim is to find a function f in the allowed class of functions that matches the examples.@@@@1@39@@danf@17-8-2009 10600580@unknown@formal@none@1@S@In other words, we wish to ''infer'' how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data.@@@@1@29@@danf@17-8-2009 10600590@unknown@formal@none@1@S@;Unsupervised learning@@@@1@2@@danf@17-8-2009 10600600@unknown@formal@none@1@S@In [[unsupervised learning]] we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f.@@@@1@30@@danf@17-8-2009 10600610@unknown@formal@none@1@S@The cost function is determined by the task formulation.@@@@1@9@@danf@17-8-2009 10600620@unknown@formal@none@1@S@Most applications fall within the domain of [[estimation problems]] such as [[statistical modeling]], [[Data compression|compression]], [[Mail filter|filtering]], [[blind source separation]] and [[data clustering|clustering]].@@@@1@23@@danf@17-8-2009 10600630@unknown@formal@none@1@S@;Reinforcement learning@@@@1@2@@danf@17-8-2009 10600640@unknown@formal@none@1@S@In [[reinforcement learning]], data x is usually not given, but generated by an agent's interactions with the environment.@@@@1@18@@danf@17-8-2009 10600650@unknown@formal@none@1@S@At each point in time t, the agent performs an action y_t and the environment generates an observation x_t and an instantaneous cost c_t, according to some (usually unknown) dynamics.@@@@1@30@@danf@17-8-2009 10600660@unknown@formal@none@1@S@The aim is to discover a ''policy'' for selecting actions that minimises some measure of a long-term cost, i.e. the expected cumulative cost.@@@@1@23@@danf@17-8-2009 10600670@unknown@formal@none@1@S@The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated.@@@@1@17@@danf@17-8-2009 10600680@unknown@formal@none@1@S@ANNs are frequently used in reinforcement learning as part of the overall algorithm.@@@@1@13@@danf@17-8-2009 10600690@unknown@formal@none@1@S@Tasks that fall within the paradigm of reinforcement learning are [[control]] problems, [[game]]s and other [[sequential decision making]] tasks.@@@@1@19@@danf@17-8-2009 10600700@unknown@formal@none@1@S@====Learning algorithms====@@@@1@2@@danf@17-8-2009 10600710@unknown@formal@none@1@S@There are many algorithms for training neural networks; most of them can be viewed as a straightforward application of [[Optimization (mathematics)|optimization]] theory and [[statistical estimation]].@@@@1@25@@danf@17-8-2009 10600720@unknown@formal@none@1@S@[[Evolutionary computation]] methods, [[simulated annealing]], [[Expectation-Maximization|expectation maximization]] and [[non-parametric methods]] are among other commonly used methods for training neural networks.@@@@1@20@@danf@17-8-2009 10600730@unknown@formal@none@1@S@See also [[machine learning]].@@@@1@4@@danf@17-8-2009 10600740@unknown@formal@none@1@S@Recent developments in this field also saw the use of [[particle swarm optimization]] and other [[swarm intelligence]] techniques used in the training of neural networks.@@@@1@25@@danf@17-8-2009 10600750@unknown@formal@none@1@S@==Neural networks and neuroscience==@@@@1@4@@danf@17-8-2009 10600760@unknown@formal@none@1@S@Theoretical and [[computational neuroscience]] is the field concerned with the theoretical analysis and computational modeling of biological neural systems.@@@@1@19@@danf@17-8-2009 10600770@unknown@formal@none@1@S@Since neural systems are intimately related to cognitive processes and behaviour, the field is closely related to cognitive and behavioural modeling.@@@@1@21@@danf@17-8-2009 10600780@unknown@formal@none@1@S@The aim of the field is to create models of biological neural systems in order to understand how biological systems work.@@@@1@21@@danf@17-8-2009 10600790@unknown@formal@none@1@S@To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning ([[biological neural network]] models) and theory (statistical learning theory and [[information theory]]).@@@@1@35@@danf@17-8-2009 10600800@unknown@formal@none@1@S@=== Types of models ===@@@@1@5@@danf@17-8-2009 10600810@unknown@formal@none@1@S@Many models are used in the field, each defined at a different level of abstraction and trying to model different aspects of neural systems.@@@@1@24@@danf@17-8-2009 10600820@unknown@formal@none@1@S@They range from models of the short-term behaviour of [[biological neuron models|individual neurons]], through models of how the dynamics of neural circuitry arise from interactions between individual neurons, to models of how behaviour can arise from abstract neural modules that represent complete subsystems.@@@@1@43@@danf@17-8-2009 10600830@unknown@formal@none@1@S@These include models of the long-term and short-term plasticity of neural systems and its relation to learning and memory, from the individual neuron to the system level.@@@@1@27@@danf@17-8-2009 10600840@unknown@formal@none@1@S@===Current research===@@@@1@2@@danf@17-8-2009 10600850@unknown@formal@none@1@S@While initially research had been concerned mostly with the electrical characteristics of neurons, a particularly important part of the investigation in recent years has been the exploration of the role of [[neuromodulators]] such as [[dopamine]], [[acetylcholine]], and [[serotonin]] on behaviour and learning.@@@@1@42@@danf@17-8-2009 10600860@unknown@formal@none@1@S@[[Biophysics|Biophysical]] models, such as [[BCM theory]], have been important in understanding mechanisms for [[synaptic plasticity]], and have had applications in both computer science and neuroscience.@@@@1@25@@danf@17-8-2009 10600870@unknown@formal@none@1@S@Research is ongoing in understanding the computational algorithms used in the brain, with some recent biological evidence for [[radial basis networks]] and [[neural backpropagation]] as mechanisms for processing data.@@@@1@29@@danf@17-8-2009 10600880@unknown@formal@none@1@S@==History of the neural network analogy==@@@@1@6@@danf@17-8-2009 10600890@unknown@formal@none@1@S@The concept of neural networks started in the late-1800s as an effort to describe how the human mind performed.@@@@1@19@@danf@17-8-2009 10600900@unknown@formal@none@1@S@These ideas started being applied to computational models with the [[Perceptron]].@@@@1@11@@danf@17-8-2009 10600910@unknown@formal@none@1@S@In early 1950s [[Friedrich Hayek]] was one of the first to posit the idea of [[spontaneous order]] in the brain arising out of decentralized networks of simple units (neurons).@@@@1@29@@danf@17-8-2009 10600920@unknown@formal@none@1@S@In the late 1940s, [[Donald Hebb]] made one of the first hypotheses for a mechanism of neural plasticity (i.e. learning), [[Hebbian learning]].@@@@1@22@@danf@17-8-2009 10600930@unknown@formal@none@1@S@Hebbian learning is considered to be a 'typical' unsupervised learning rule and it (and variants of it) was an early model for [[long term potentiation]].@@@@1@25@@danf@17-8-2009 10600940@unknown@formal@none@1@S@The [[Perceptron]] is essentially a linear classifier for classifying data x \\in R^n specified by parameters w \\in R^n, b \\in R and an output function f = w'x + b.@@@@1@32@@danf@17-8-2009 10600950@unknown@formal@none@1@S@Its parameters are adapted with an ad-hoc rule similar to stochastic steepest gradient descent.@@@@1@14@@danf@17-8-2009 10600960@unknown@formal@none@1@S@Because the [[inner product]] is a [[linear operator]] in the input space, the Perceptron can only perfectly classify a set of data for which different classes are [[linearly separable]] in the input space, while it often fails completely for non-separable data.@@@@1@41@@danf@17-8-2009 10600970@unknown@formal@none@1@S@While the development of the algorithm initially generated some enthusiasm, partly because of its apparent relation to biological mechanisms, the later discovery of this inadequacy caused such models to be abandoned until the introduction of non-linear models into the field.@@@@1@40@@danf@17-8-2009 10600980@unknown@formal@none@1@S@The [[Cognitron]] (1975) was an early multilayered neural network with a training algorithm.@@@@1@13@@danf@17-8-2009 10600990@unknown@formal@none@1@S@The actual structure of the network and the methods used to set the interconnection weights change from one neural strategy to another, each with its advantages and disadvantages.@@@@1@28@@danf@17-8-2009 10601000@unknown@formal@none@1@S@Networks can propagate information in one direction only, or they can bounce back and forth until self-activation at a node occurs and the network settles on a final state.@@@@1@29@@danf@17-8-2009 10601010@unknown@formal@none@1@S@The ability for bi-directional flow of inputs between neurons/nodes was produced with the [[Hopfield net|Hopfield's network]] (1982), and specialization of these node layers for specific purposes was introduced through the first [[hybrid neural network|hybrid network]].@@@@1@35@@danf@17-8-2009 10601020@unknown@formal@none@1@S@The [[connectionism|parallel distributed processing]] of the mid-1980s became popular under the name [[connectionism]].@@@@1@13@@danf@17-8-2009 10601030@unknown@formal@none@1@S@The rediscovery of the [[backpropagation]] algorithm was probably the main reason behind the repopularisation of neural networks after the publication of "Learning Internal Representations by Error Propagation" in 1986 (Though backpropagation itself dates from 1974).@@@@1@35@@danf@17-8-2009 10601040@unknown@formal@none@1@S@The original network utilised multiple layers of weight-sum units of the type f = g(w'x + b), where g was a [[sigmoid function]] or [[logistic function]] such as used in [[logistic regression]].@@@@1@32@@danf@17-8-2009 10601050@unknown@formal@none@1@S@Training was done by a form of stochastic steepest gradient descent.@@@@1@11@@danf@17-8-2009 10601060@unknown@formal@none@1@S@The employment of the chain rule of differentiation in deriving the appropriate parameter updates results in an algorithm that seems to 'backpropagate errors', hence the nomenclature.@@@@1@26@@danf@17-8-2009 10601070@unknown@formal@none@1@S@However it is essentially a form of gradient descent.@@@@1@9@@danf@17-8-2009 10601080@unknown@formal@none@1@S@Determining the optimal parameters in a model of this type is not trivial, and steepest gradient descent methods cannot be relied upon to give the solution without a good starting point.@@@@1@31@@danf@17-8-2009 10601090@unknown@formal@none@1@S@In recent times, networks with the same architecture as the backpropagation network are referred to as [[Multilayer perceptron|Multi-Layer Perceptrons]].@@@@1@19@@danf@17-8-2009 10601100@unknown@formal@none@1@S@This name does not impose any limitations on the type of algorithm used for learning.@@@@1@15@@danf@17-8-2009 10601110@unknown@formal@none@1@S@The backpropagation network generated much enthusiasm at the time and there was much controversy about whether such learning could be implemented in the brain or not, partly because a mechanism for reverse signalling was not obvious at the time, but most importantly because there was no plausible source for the 'teaching' or 'target' signal.@@@@1@54@@danf@17-8-2009 10601120@unknown@formal@none@1@S@==Criticism==@@@@1@1@@danf@17-8-2009 10601130@unknown@formal@none@1@S@[[A. K. Dewdney]], a former ''[[Scientific American]]'' columnist, wrote in 1997, ''“Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool.”''@@@@1@40@@danf@17-8-2009 10601140@unknown@formal@none@1@S@(Dewdney, p.82)@@@@1@2@@danf@17-8-2009 10601150@unknown@formal@none@1@S@Arguments against Dewdney's position are that neural nets have been successfully used to solve many complex and diverse tasks, ranging from autonomously flying aircraft[http://www.nasa.gov/centers/dryden/news/NewsReleases/2003/03-49.html] to detecting credit card fraud[http://www.visa.ca/en/about/visabenefits/innovation.cfm].@@@@1@29@@danf@17-8-2009 10601160@unknown@formal@none@1@S@Technology writer [[Roger Bridgman]] commented on Dewdney's statements about neural nets:@@@@1@11@@danf@17-8-2009 10601170@unknown@formal@none@1@S@
Neural networks, for instance, are in the dock not only because they have been hyped to high heaven, (what hasn't?) but also because you could create a successful net without understanding how it worked: the bunch of numbers that captures its behaviour would in all probability be "an opaque, unreadable table...valueless as a scientific resource".@@@@1@56@@danf@17-8-2009 10601180@unknown@formal@none@1@S@In spite of his emphatic declaration that science is not technology, Dewdney seems here to pillory neural nets as bad science when most of those devising them are just trying to be good engineers.@@@@1@34@@danf@17-8-2009 10601190@unknown@formal@none@1@S@An unreadable table that a useful machine could read would still be well worth having.
@@@@1@15@@danf@17-8-2009 10610010@unknown@formal@none@1@S@
N-gram
@@@@1@1@@danf@17-8-2009 10610020@unknown@formal@none@1@S@An '''''n''-gram''' is a sub-sequence of ''n'' items from a given [[sequence]].@@@@1@12@@danf@17-8-2009 10610025@unknown@formal@none@1@S@''n''-grams are used in various areas of statistical [[natural language processing]] and genetic sequence analysis.@@@@1@15@@danf@17-8-2009 10610030@unknown@formal@none@1@S@The items in question can be letters, words or [[base pairs]] according to the application.@@@@1@15@@danf@17-8-2009 10610040@unknown@formal@none@1@S@An ''n''-gram of size 1 is a "[[unigram]]"; size 2 is a "[[bigram]]" (or, more etymologically sound but less commonly used, a "digram"); size 3 is a "[[trigram]]"; and size 4 or more is simply called an "''n''-gram".@@@@1@38@@danf@17-8-2009 10610050@unknown@formal@none@1@S@Some [[language model]]s built from n-grams are "(''n'' − 1)-order [[Markov_chain|Markov model]]s".@@@@1@10@@danf@17-8-2009 10610060@unknown@formal@none@1@S@==Examples==@@@@1@1@@danf@17-8-2009 10610070@unknown@formal@none@1@S@Here are examples of '''''word''''' level 3-grams and 4-grams (and counts of the number of times they appeared) from the [[N-gram#Google_use_of_N-gram|Google n-gram corpus]].@@@@1@23@@danf@17-8-2009 10610080@unknown@formal@none@1@S@*ceramics collectables collectibles (55)@@@@1@4@@danf@17-8-2009 10610090@unknown@formal@none@1@S@*ceramics collectables fine (130)@@@@1@4@@danf@17-8-2009 10610100@unknown@formal@none@1@S@*ceramics collected by (52)@@@@1@4@@danf@17-8-2009 10610110@unknown@formal@none@1@S@*ceramics collectible pottery (50)@@@@1@4@@danf@17-8-2009 10610120@unknown@formal@none@1@S@*ceramics collectibles cooking (45)@@@@1@4@@danf@17-8-2009 10610130@unknown@formal@none@1@S@4-grams@@@@1@1@@danf@17-8-2009 10610140@unknown@formal@none@1@S@*serve as the incoming (92)@@@@1@5@@danf@17-8-2009 10610150@unknown@formal@none@1@S@*serve as the incubator (99)@@@@1@5@@danf@17-8-2009 10610160@unknown@formal@none@1@S@*serve as the independent (794)@@@@1@5@@danf@17-8-2009 10610170@unknown@formal@none@1@S@*serve as the index (223)@@@@1@5@@danf@17-8-2009 10610180@unknown@formal@none@1@S@*serve as the indication (72)@@@@1@5@@danf@17-8-2009 10610190@unknown@formal@none@1@S@*serve as the indicator (120)@@@@1@5@@danf@17-8-2009 10610200@unknown@formal@none@1@S@==''n''-gram models==@@@@1@2@@danf@17-8-2009 10610210@unknown@formal@none@1@S@An '''''n''-gram model''' models sequences, notably natural languages, using the statistical properties of ''n''-grams.@@@@1@14@@danf@17-8-2009 10610220@unknown@formal@none@1@S@This idea can be traced to an experiment by [[Claude Shannon]]'s work in [[information theory]].@@@@1@15@@danf@17-8-2009 10610230@unknown@formal@none@1@S@His question was, given a sequence of letters (for example, the sequence "for ex"), what is the [[likelihood]] of the next letter?@@@@1@22@@danf@17-8-2009 10610240@unknown@formal@none@1@S@From training data, one can derive a [[probability distribution]] for the next letter given a history of size n: ''a'' = 0.4, ''b'' = 0.00001, ''c'' = 0, ....; where the probabilities of all possible "next-letters" sum to 1.0.@@@@1@39@@danf@17-8-2009 10610250@unknown@formal@none@1@S@More concisely, an ''n''-gram model predicts x_{i} based on x_{i-1}, x_{i-2}, \\dots, x_{i-n}.@@@@1@13@@danf@17-8-2009 10610260@unknown@formal@none@1@S@In Probability terms, this is nothing but P(x_{i} | x_{i-1}, x_{i-2}, \\dots, x_{i-n}).@@@@1@13@@danf@17-8-2009 10610270@unknown@formal@none@1@S@When used for [[language model|language modeling]] independence assumptions are made so that each word depends only on the last ''n'' words.@@@@1@21@@danf@17-8-2009 10610280@unknown@formal@none@1@S@This [[Markov model]] is used as an approximation of the true underlying language.@@@@1@13@@danf@17-8-2009 10610290@unknown@formal@none@1@S@This assumption is important because it massively simplifies the problem of learning the language model from data.@@@@1@17@@danf@17-8-2009 10610300@unknown@formal@none@1@S@In addition, because of the open nature of language, it is common to group words unknown to the language model together.@@@@1@21@@danf@17-8-2009 10610310@unknown@formal@none@1@S@''n''-gram models are widely used in statistical [[natural language processing]].@@@@1@10@@danf@17-8-2009 10610320@unknown@formal@none@1@S@In [[speech recognition]], [[phonemes]] and sequences of phonemes are modeled using a ''n''-gram distribution.@@@@1@14@@danf@17-8-2009 10610330@unknown@formal@none@1@S@For parsing, words are modeled such that each ''n''-gram is composed of ''n'' words.@@@@1@14@@danf@17-8-2009 10610340@unknown@formal@none@1@S@For [[language recognition]], sequences of letters are modeled for different languages.@@@@1@11@@danf@17-8-2009 10610350@unknown@formal@none@1@S@For a sequence of words, (for example "the dog smelled like a skunk"), the trigrams would be: "the dog smelled", "dog smelled like", "smelled like a", and "like a skunk".@@@@1@30@@danf@17-8-2009 10610360@unknown@formal@none@1@S@For sequences of characters, the 3-grams (sometimes referred to as "trigrams") that can be generated from "good morning" are "goo", "ood", "od ", "d m", " mo", "mor" and so forth.@@@@1@31@@danf@17-8-2009 10610370@unknown@formal@none@1@S@Some practitioners preprocess strings to remove spaces, most simply collapse whitespace to a single space while preserving paragraph marks.@@@@1@19@@danf@17-8-2009 10610380@unknown@formal@none@1@S@Punctuation is also commonly reduced or removed by preprocessing.@@@@1@9@@danf@17-8-2009 10610385@unknown@formal@none@1@S@''n''-grams can also be used for sequences of words or, in fact, for almost any type of data.@@@@1@18@@danf@17-8-2009 10610390@unknown@formal@none@1@S@They have been used for example for extracting features for clustering large sets of satellite earth images and for determining what part of the Earth a particular image came from.@@@@1@30@@danf@17-8-2009 10610400@unknown@formal@none@1@S@They have also been very successful as the first pass in genetic sequence search and in the identification of which species short sequences of DNA were taken from.@@@@1@28@@danf@17-8-2009 10610410@unknown@formal@none@1@S@N-gram models are often criticized because they lack any explicit representation of long range dependency.@@@@1@15@@danf@17-8-2009 10610420@unknown@formal@none@1@S@While it is true that the only explicit dependency range is (n-1) tokens for an n-gram model, it is also true that the effective range of dependency is significantly longer than this although long range correlations drop exponentially with distance for any Markov model.@@@@1@44@@danf@17-8-2009 10610430@unknown@formal@none@1@S@Alternative Markov language models that incorporate some degree of local state can exhibit very long range dependencies.@@@@1@17@@danf@17-8-2009 10610440@unknown@formal@none@1@S@This is often done using hand-crafted state variables that represent, for instance, the position in a sentence, the general topic of discourse or a grammatical state variable.@@@@1@27@@danf@17-8-2009 10610450@unknown@formal@none@1@S@Some of the best parsers of English currently in existence are roughly of this form.@@@@1@15@@danf@17-8-2009 10610460@unknown@formal@none@1@S@Another criticism that has been leveled is that Markov models of language, including n-gram models, do not explicitly capture the performance/competence distinction introduced by [[Noam Chomsky]].@@@@1@26@@danf@17-8-2009 10610470@unknown@formal@none@1@S@This criticism fails to explain why parsers that are the best at parsing text seem to uniformly lack any such distinction and most even lack any clear distinction between semantics and syntax.@@@@1@32@@danf@17-8-2009 10610480@unknown@formal@none@1@S@Most proponents of n-gram and related language models opt for a fairly pragmatic approach to language modeling that emphasizes empirical results over theoretical purity.@@@@1@24@@danf@17-8-2009 10610490@unknown@formal@none@1@S@==''n''-grams for approximate matching==@@@@1@4@@danf@17-8-2009 10610500@unknown@formal@none@1@S@''n''-grams can also be used for efficient approximate matching.@@@@1@9@@danf@17-8-2009 10610510@unknown@formal@none@1@S@By converting a sequence of items to a set of ''n''-grams, it can be embedded in a [[vector space]] (in other words, represented as a [[histogram]]), thus allowing the sequence to be compared to other sequences in an efficient manner.@@@@1@40@@danf@17-8-2009 10610520@unknown@formal@none@1@S@For example, if we convert strings with only letters in the English alphabet into 3-grams, we get a 26^3-dimensional space (the first dimension measures the number of occurrences of "aaa", the second "aab", and so forth for all possible combinations of three letters).@@@@1@43@@danf@17-8-2009 10610530@unknown@formal@none@1@S@Using this representation, we lose information about the string.@@@@1@9@@danf@17-8-2009 10610540@unknown@formal@none@1@S@For example, both the strings "abcba" and "bcbab" give rise to exactly the same 2-grams.@@@@1@15@@danf@17-8-2009 10610550@unknown@formal@none@1@S@However, we know empirically that if two strings of real text have a similar vector representation (as measured by [[dot product|cosine distance]]) then they are likely to be similar.@@@@1@29@@danf@17-8-2009 10610560@unknown@formal@none@1@S@Other metrics have also been applied to vectors of ''n''-grams with varying, sometimes better, results.@@@@1@15@@danf@17-8-2009 10610570@unknown@formal@none@1@S@For example [[z-score]]s have been used to compare documents by examining how many standard deviations each ''n''-gram differs from its mean occurrence in a large collection, or [[text corpus]], of documents (which form the "background" vector).@@@@1@36@@danf@17-8-2009 10610580@unknown@formal@none@1@S@In the event of small counts, the [[g-score]] may give better results for comparing alternative models.@@@@1@16@@danf@17-8-2009 10610590@unknown@formal@none@1@S@It is also possible to take a more principled approach to the statistics of ''n''-grams, modeling similarity as the likelihood that two strings came from the same source directly in terms of a problem in [[Bayesian inference]].@@@@1@37@@danf@17-8-2009 10610600@unknown@formal@none@1@S@==Other applications==@@@@1@2@@danf@17-8-2009 10610610@unknown@formal@none@1@S@''n''-grams find use in several areas of computer science, [[computational linguistics]], and applied mathematics.@@@@1@14@@danf@17-8-2009 10610620@unknown@formal@none@1@S@They have been used to:@@@@1@5@@danf@17-8-2009 10610630@unknown@formal@none@1@S@* design [[kernel (mathematics)|kernels]] that allow [[machine learning]] algorithms such as [[support vector machine]]s to learn from string data@@@@1@19@@danf@17-8-2009 10610640@unknown@formal@none@1@S@* find likely candidates for the correct spelling of a misspelled word@@@@1@12@@danf@17-8-2009 10610650@unknown@formal@none@1@S@* improve compression in [[data compression|compression algorithms]] where a small area of data requires ''n''-grams of greater length@@@@1@18@@danf@17-8-2009 10610660@unknown@formal@none@1@S@* assess the probability of a given word sequence appearing in text of a language of interest in pattern recognition systems, [[speech recognition]], OCR ([[optical character recognition]]), [[Intelligent Character Recognition]] ([[ICR]]), [[machine translation]] and similar applications@@@@1@36@@danf@17-8-2009 10610670@unknown@formal@none@1@S@* improve retrieval in [[information retrieval]] systems when it is hoped to find similar "documents" (a term for which the conventional meaning is sometimes stretched, depending on the data set) given a single query document and a database of reference documents@@@@1@41@@danf@17-8-2009 10610680@unknown@formal@none@1@S@* improve retrieval performance in genetic sequence analysis as in the [[BLAST]] family of programs@@@@1@15@@danf@17-8-2009 10610690@unknown@formal@none@1@S@* identify the language a text is in or the species a small sequence of DNA was taken from@@@@1@19@@danf@17-8-2009 10610700@unknown@formal@none@1@S@* predict letters or words at random in order to create text, as in the [[dissociated press]] algorithm.@@@@1@18@@danf@17-8-2009 10610710@unknown@formal@none@1@S@== Bias-versus-variance trade-off ==@@@@1@4@@danf@17-8-2009 10610720@unknown@formal@none@1@S@What goes into picking the ''n'' for the ''n''-gram?@@@@1@9@@danf@17-8-2009 10610730@unknown@formal@none@1@S@There are problems of balance weight between ''infrequent grams'' (for example, if a proper name appeared in the training data) and ''frequent grams''.@@@@1@23@@danf@17-8-2009 10610740@unknown@formal@none@1@S@Also, items not seen in the training data will be given a [[probability]] of 0.0 without [[smoothing]].@@@@1@17@@danf@17-8-2009 10610750@unknown@formal@none@1@S@For unseen but plausible data from a sample, one can introduce [[pseudocount]]s.@@@@1@12@@danf@17-8-2009 10610760@unknown@formal@none@1@S@Pseudocounts are generally motivated on Bayesian grounds.@@@@1@7@@danf@17-8-2009 10610770@unknown@formal@none@1@S@=== Smoothing techniques ===@@@@1@4@@danf@17-8-2009 10610780@unknown@formal@none@1@S@* [[Linear interpolation]] (e.g., taking the [[weighted mean]] of the unigram, bigram, and trigram)@@@@1@14@@danf@17-8-2009 10610790@unknown@formal@none@1@S@* [[Good-Turing]] discounting@@@@1@3@@danf@17-8-2009 10610800@unknown@formal@none@1@S@* [[Witten-Bell discounting]]@@@@1@3@@danf@17-8-2009 10610810@unknown@formal@none@1@S@* [[Katz's back-off model]] (trigram)@@@@1@5@@danf@17-8-2009 10610820@unknown@formal@none@1@S@==Google use of N-gram==@@@@1@4@@danf@17-8-2009 10610830@unknown@formal@none@1@S@[[Google]] uses n-gram models for a variety of R&D projects, such as [[statistical machine translation]], [[speech recognition]], [[Spell checker|checking spelling]], [[entity detection]], and [[information extraction|data mining]].@@@@1@26@@danf@17-8-2009 10610840@unknown@formal@none@1@S@In September of 2006 [http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html Google announced] that they made their n-grams [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 public] at the [[Linguistic Data Consortium]] ([http://www.ldc.upenn.edu/ LDC]).@@@@1@21@@danf@17-8-2009 10620010@unknown@formal@none@1@S@
Noun
@@@@1@1@@danf@17-8-2009 10620020@unknown@formal@none@1@S@In [[linguistics]], a '''noun''' is a member of a large, [[open class (linguistics)|open]] [[lexical category]] whose members can occur as the main word in the [[subject (grammar)|subject]] of a [[clause]], the [[object (grammar)|object]] of a [[verb]], or the object of a [[preposition]].@@@@1@42@@danf@17-8-2009 10620030@unknown@formal@none@1@S@Lexical categories are defined in terms of how their members combine with other kinds of expressions.@@@@1@16@@danf@17-8-2009 10620040@unknown@formal@none@1@S@The syntactic rules for nouns differ from language to language.@@@@1@10@@danf@17-8-2009 10620050@unknown@formal@none@1@S@In [[English language|English]], nouns may be defined as those words which can occur with articles and [[adjective|attributive adjectives]] and can function as the [[phrase|head]] of a [[noun phrase]].@@@@1@28@@danf@17-8-2009 10620060@unknown@formal@none@1@S@In [[traditional grammar|traditional]] English grammar, the noun is one of the eight [[parts of speech]].@@@@1@15@@danf@17-8-2009 10620070@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10620080@unknown@formal@none@1@S@The word comes from the [[Latin]] ''nomen'' meaning "[[name]]".@@@@1@9@@danf@17-8-2009 10620090@unknown@formal@none@1@S@Word classes like nouns were first described by the Sanskrit grammarian [[Panini (grammarian)|{{IAST|Pāṇini}}]] and ancient Greeks like [[Dionysios Thrax]]; and were defined in terms of their [[morphology (linguistics)|morphological]] properties.@@@@1@29@@danf@17-8-2009 10620100@unknown@formal@none@1@S@For example, in Ancient Greek, nouns inflect for [[case (grammar)|grammatical case]], such as dative or accusative.@@@@1@16@@danf@17-8-2009 10620110@unknown@formal@none@1@S@[[Verb]]s, on the other hand, inflect for [[grammatical tense|tenses]], such as past, present or future, while nouns do not.@@@@1@19@@danf@17-8-2009 10620120@unknown@formal@none@1@S@[[Aristotle]] also had a notion of ''onomata'' (nouns) and ''rhemata'' (verbs) which, however, does not exactly correspond with modern notions of nouns and verbs.@@@@1@24@@danf@17-8-2009 10620130@unknown@formal@none@1@S@Vinokurova 2005 has a more detailed discussion of the historical origin of the notion of a noun.@@@@1@17@@danf@17-8-2009 10620140@unknown@formal@none@1@S@==Different definitions of nouns==@@@@1@4@@danf@17-8-2009 10620150@unknown@formal@none@1@S@Expressions of [[natural language]] have properties at different levels.@@@@1@9@@danf@17-8-2009 10620160@unknown@formal@none@1@S@They have ''formal'' properties, like what kinds of [[morphology (linguistics)|morphological]] [[prefix]]es or [[suffix]]es they take and what kinds of other expressions they combine with; but they also have [[semantics|semantic]] properties, i.e. properties pertaining to their meaning.@@@@1@36@@danf@17-8-2009 10620170@unknown@formal@none@1@S@The definition of a noun at the outset of this page is thus a ''formal'', traditional grammatical definition.@@@@1@18@@danf@17-8-2009 10620180@unknown@formal@none@1@S@That definition, for the most part, is considered uncontroversial and furnishes the propensity for certain language users to effectively distinguish most nouns from non-nouns.@@@@1@24@@danf@17-8-2009 10620190@unknown@formal@none@1@S@However, it has the disadvantage that it does not apply to nouns in all languages.@@@@1@15@@danf@17-8-2009 10620200@unknown@formal@none@1@S@For example in [[Russian language|Russian]], there are no definite articles, so one cannot define nouns as words that are modified by definite articles.@@@@1@23@@danf@17-8-2009 10620210@unknown@formal@none@1@S@There are also several attempts of defining nouns in terms of their [[semantics|semantic]] properties.@@@@1@14@@danf@17-8-2009 10620220@unknown@formal@none@1@S@Many of these are controversial, but some are discussed below.@@@@1@10@@danf@17-8-2009 10620230@unknown@formal@none@1@S@===Names for things===@@@@1@3@@danf@17-8-2009 10620240@unknown@formal@none@1@S@In [[Traditional grammar|traditional school grammars]], one often encounters the definition of nouns that they are all and only those expressions that refer to a ''person'', ''place'', ''thing'', ''event'', ''substance'', ''quality'', or ''idea'', etc.@@@@1@33@@danf@17-8-2009 10620250@unknown@formal@none@1@S@This is a ''semantic'' definition.@@@@1@5@@danf@17-8-2009 10620260@unknown@formal@none@1@S@It has been criticized by contemporary linguists as being uninformative.@@@@1@10@@danf@17-8-2009 10620270@unknown@formal@none@1@S@Contemporary linguists generally agree that one cannot successfully define nouns (or other grammatical categories) in terms of what sort of ''object in the world'' they ''[[reference|refer]] to'' or ''[[signification|signify]]''.@@@@1@29@@danf@17-8-2009 10620280@unknown@formal@none@1@S@Part of the [[conundrum]] is that the definition makes use of relatively ''general'' nouns ("thing", "phenomenon", "event") to define what nouns ''are''.@@@@1@22@@danf@17-8-2009 10620290@unknown@formal@none@1@S@The existence of such ''general'' nouns demonstrates that nouns refer to entities that are organized in [[taxonomy|taxonomic]] [[hierarchies]].@@@@1@18@@danf@17-8-2009 10620300@unknown@formal@none@1@S@But other kinds of expressions are also organized into such structured taxonomic relationships.@@@@1@13@@danf@17-8-2009 10620310@unknown@formal@none@1@S@For example the verbs "stroll","saunter", "stride", and "tread" are more specific words than the more ''general'' "walk".@@@@1@17@@danf@17-8-2009 10620320@unknown@formal@none@1@S@Moreover, "walk" is more specific than the verb "move", which, in turn, is less general than "change".@@@@1@17@@danf@17-8-2009 10620330@unknown@formal@none@1@S@But it is unlikely that such taxonomic relationships can be used to ''define'' nouns and verbs.@@@@1@16@@danf@17-8-2009 10620340@unknown@formal@none@1@S@We cannot ''define'' verbs as those words that refer to "changes" or "states", for example, because the nouns ''change'' and ''state'' probably refer to such things, but, of course, aren't verbs.@@@@1@31@@danf@17-8-2009 10620350@unknown@formal@none@1@S@Similarly, nouns like "invasion", "meeting", or "collapse" refer to things that are "done" or "happen".@@@@1@15@@danf@17-8-2009 10620360@unknown@formal@none@1@S@In fact, an influential [[theory]] has it that verbs like "kill" or "die" refer to events, which is among the sort of thing that nouns are supposed to refer to.@@@@1@30@@danf@17-8-2009 10620370@unknown@formal@none@1@S@The point being made here is not that this view of verbs is wrong, but rather that this property of verbs is a poor basis for a ''definition'' of this category, just like the property of ''having wheels'' is a poor basis for a definition of cars (some things that have wheels, such as my suitcase or a jumbo jet, aren't cars).@@@@1@62@@danf@17-8-2009 10620380@unknown@formal@none@1@S@Similarly, adjectives like "yellow" or "difficult" might be thought to refer to qualities, and adverbs like "outside" or "upstairs" seem to refer to places, which are also among the sorts of things nouns can refer to.@@@@1@36@@danf@17-8-2009 10620390@unknown@formal@none@1@S@But verbs, adjectives and adverbs are not nouns, and nouns aren't verbs, adjectives or adverbs.@@@@1@15@@danf@17-8-2009 10620400@unknown@formal@none@1@S@One might argue that "definitions" of this sort really rely on speakers' prior intuitive knowledge of what nouns, verbs and adjectives are, and, so don't really add anything over and beyond this.@@@@1@32@@danf@17-8-2009 10620410@unknown@formal@none@1@S@Speakers' intuitive knowledge of such things might plausibly be based on ''formal'' criteria, such as the traditional grammatical definition of English nouns aforementioned.@@@@1@23@@danf@17-8-2009 10620420@unknown@formal@none@1@S@===Prototypically referential expressions===@@@@1@3@@danf@17-8-2009 10620430@unknown@formal@none@1@S@Another semantic definition of nouns is that they are ''prototypically referential.''@@@@1@11@@danf@17-8-2009 10620440@unknown@formal@none@1@S@That definition is also not very helpful in distinguishing actual nouns from verbs.@@@@1@13@@danf@17-8-2009 10620450@unknown@formal@none@1@S@But it may still correctly identify a core property of nounhood.@@@@1@11@@danf@17-8-2009 10620460@unknown@formal@none@1@S@For example, we will tend to use nouns like "fool" and "car" when we wish to refer to fools and cars, respectively.@@@@1@22@@danf@17-8-2009 10620470@unknown@formal@none@1@S@The notion that this is '''prototypical''' reflects the fact that such nouns can be used, even though nothing with the corresponding property is referred to:@@@@1@25@@danf@17-8-2009 10620480@unknown@formal@none@1@S@:John is no '''fool'''.@@@@1@4@@danf@17-8-2009 10620490@unknown@formal@none@1@S@:If I had a '''car''', I'd go to Marrakech.@@@@1@9@@danf@17-8-2009 10620500@unknown@formal@none@1@S@The first sentence above doesn't refer to any fools, nor does the second one refer to any particular car.@@@@1@19@@danf@17-8-2009 10620510@unknown@formal@none@1@S@===Predicates with identity criteria===@@@@1@4@@danf@17-8-2009 10620520@unknown@formal@none@1@S@The British logician [[Peter Thomas Geach]] proposed a very subtle semantic definition of nouns.@@@@1@14@@danf@17-8-2009 10620530@unknown@formal@none@1@S@He noticed that adjectives like "same" can modify nouns, but no other kinds of parts of speech, like [[verbs]] or [[adjectives]].@@@@1@21@@danf@17-8-2009 10620540@unknown@formal@none@1@S@Not only that, but there also doesn't seem to be any ''other'' expressions with similar meaning that can modify verbs and adjectives.@@@@1@22@@danf@17-8-2009 10620550@unknown@formal@none@1@S@Consider the following examples.@@@@1@4@@danf@17-8-2009 10620560@unknown@formal@none@1@S@: Good: John and Bill participated in the '''same''' fight.@@@@1@10@@danf@17-8-2009 10620570@unknown@formal@none@1@S@: Bad:@@@@1@2@@danf@17-8-2009 10620580@unknown@formal@none@1@S@*John and Bill '''samely''' fought.@@@@1@5@@danf@17-8-2009 10620590@unknown@formal@none@1@S@There is no English adverb "samely".@@@@1@6@@danf@17-8-2009 10620600@unknown@formal@none@1@S@In some other languages, like Czech, however there are adverbs corresponding to "samely".@@@@1@13@@danf@17-8-2009 10620610@unknown@formal@none@1@S@Hence, in Czech, the translation of the last sentence would be fine; however, it would mean that John and Bill fought ''in the same way'': not that they participated in the ''same fight''.@@@@1@33@@danf@17-8-2009 10620620@unknown@formal@none@1@S@Geach proposed that we could explain this, if nouns denote logical [[predicate (grammar)|predicate]]s with '''identity criteria'''.@@@@1@16@@danf@17-8-2009 10620630@unknown@formal@none@1@S@An identity criterion would allow us to conclude, for example, that "person x at time 1 is ''the same person'' as person y at time 2".@@@@1@26@@danf@17-8-2009 10620640@unknown@formal@none@1@S@Different nouns can have different identity criteria.@@@@1@7@@danf@17-8-2009 10620650@unknown@formal@none@1@S@A well known example of this is due to Gupta:@@@@1@10@@danf@17-8-2009 10620660@unknown@formal@none@1@S@:National Airlines transported 2 million '''passengers''' in 1979.@@@@1@8@@danf@17-8-2009 10620670@unknown@formal@none@1@S@:National Airlines transported (at least) 2 million '''persons''' in 1979.@@@@1@10@@danf@17-8-2009 10620680@unknown@formal@none@1@S@Given that, in general, all passengers are persons, the last sentence above ought to follow logically from the first one.@@@@1@20@@danf@17-8-2009 10620690@unknown@formal@none@1@S@But it doesn't.@@@@1@3@@danf@17-8-2009 10620700@unknown@formal@none@1@S@It is easy to imagine, for example, that on average, every person who travelled with National Airlines in 1979, travelled with them twice.@@@@1@23@@danf@17-8-2009 10620710@unknown@formal@none@1@S@In that case, one would say that the airline transported 2 million ''passengers'' but only 1 million ''persons''.@@@@1@18@@danf@17-8-2009 10620720@unknown@formal@none@1@S@Thus, the way that we count ''passengers'' isn't necessarily the same as the way that we count ''persons''.@@@@1@18@@danf@17-8-2009 10620730@unknown@formal@none@1@S@Put somewhat differently: At two different times, ''you'' may correspond to two distinct ''passengers'', even though you are one and the same person.@@@@1@23@@danf@17-8-2009 10620740@unknown@formal@none@1@S@For a precise definition of ''identity criteria'', see Gupta.@@@@1@9@@danf@17-8-2009 10620750@unknown@formal@none@1@S@Recently, Baker has proposed that Geach's definition of nouns in terms of identity criteria allows us to ''explain'' the characteristic properties of nouns.@@@@1@23@@danf@17-8-2009 10620760@unknown@formal@none@1@S@He argues that nouns can co-occur with (in-)definite articles and numerals, and are "prototypically referential" ''because'' they are all and only those [[parts of speech]] that provide identity criteria.@@@@1@29@@danf@17-8-2009 10620770@unknown@formal@none@1@S@Baker's proposals are quite new, and linguists are still evaluating them.@@@@1@11@@danf@17-8-2009 10620780@unknown@formal@none@1@S@==Classification of nouns in English==@@@@1@5@@danf@17-8-2009 10620790@unknown@formal@none@1@S@===Proper nouns and common nouns===@@@@1@5@@danf@17-8-2009 10620800@unknown@formal@none@1@S@''Proper nouns'' (also called ''[[proper name]]s'') are nouns representing unique entities (such as ''London'', ''Universe'' or ''John''), as distinguished from common nouns which describe a class of entities (such as ''city'', ''planet'' or ''person'').@@@@1@34@@danf@17-8-2009 10620810@unknown@formal@none@1@S@In [[English language|English]] and most other languages that use the [[Latin alphabet]], proper nouns are usually [[capitalization|capitalized]].@@@@1@17@@danf@17-8-2009 10620820@unknown@formal@none@1@S@Languages differ in whether most elements of multiword proper nouns are capitalised (e.g., American English ''House of Representatives'') or only the initial element (e.g., Slovenian ''Državni zbor'' 'National Assembly').@@@@1@29@@danf@17-8-2009 10620830@unknown@formal@none@1@S@In [[German language|German]], nouns of all types are capitalized.@@@@1@9@@danf@17-8-2009 10620840@unknown@formal@none@1@S@The convention of capitalizing ''all'' nouns was previously used in English, but ended circa 1800.@@@@1@15@@danf@17-8-2009 10620850@unknown@formal@none@1@S@In America, the shift in capitalization is recorded in several noteworthy documents.@@@@1@12@@danf@17-8-2009 10620860@unknown@formal@none@1@S@The end (but not the beginning) of the [[United States Declaration of Independence#Annotated text of the Declaration|Declaration of Independence]] (1776) and all of the [[United States Constitution|Constitution]] (1787) show nearly all nouns capitalized, the [[United States Bill of Rights#Text of the Bill of Rights|Bill of Rights]] (1789) capitalizes a few common nouns but not most of them, and the [[Thirteenth Amendment to the United States Constitution|Thirteenth Constitutional Amendment]] (1865) only capitalizes proper nouns.@@@@1@73@@danf@17-8-2009 10620870@unknown@formal@none@1@S@Sometimes the same word can function as both a common noun and a proper noun, where one such entity is special.@@@@1@21@@danf@17-8-2009 10620880@unknown@formal@none@1@S@For example the common noun ''god'' denotes all deities, while the proper noun ''God'' references the [[monotheism|monotheistic]] [[God]] specifically.@@@@1@19@@danf@17-8-2009 10620890@unknown@formal@none@1@S@Owing to the essentially arbitrary nature of [[Orthography|orthographic]] classification and the existence of variant authorities and adopted [[Style guide|''house styles'']], questionable capitalization of words is not uncommon, even in respected newspapers and magazines.@@@@1@33@@danf@17-8-2009 10620900@unknown@formal@none@1@S@Most publishers, however, properly require ''consistency'', at least within the same document, in applying their specified standard.@@@@1@17@@danf@17-8-2009 10620910@unknown@formal@none@1@S@The common meaning of the word or words constituting a proper noun may be unrelated to the object to which the proper noun refers.@@@@1@24@@danf@17-8-2009 10620920@unknown@formal@none@1@S@For example, someone might be named "Tiger Smith" despite being neither a [[tiger]] nor a [[smith (metalwork)|smith]].@@@@1@17@@danf@17-8-2009 10620930@unknown@formal@none@1@S@For this reason, proper nouns are usually not [[translation|translated]] between languages, although they may be [[transliteration|transliterated]].@@@@1@16@@danf@17-8-2009 10620940@unknown@formal@none@1@S@For example, the German surname ''Knödel'' becomes ''Knodel'' or ''Knoedel'' in English (not the literal ''Dumpling'').@@@@1@16@@danf@17-8-2009 10620950@unknown@formal@none@1@S@However, the [[Transliteration|transcription]] of place names and the names of [[monarch]]s, [[pope]]s, and non-contemporary [[author]]s is common and sometimes universal.@@@@1@20@@danf@17-8-2009 10620960@unknown@formal@none@1@S@For instance, the [[Portuguese language|Portuguese]] word ''Lisboa'' becomes ''[[Lisbon]]'' in [[English language|English]]; the English ''London'' becomes ''Londres'' in French; and the [[ancient Greek|Greek]] ''Aristotelēs'' becomes [[Aristotle]] in English.@@@@1@28@@danf@17-8-2009 10620970@unknown@formal@none@1@S@===Countable and uncountable nouns===@@@@1@4@@danf@17-8-2009 10620980@unknown@formal@none@1@S@''Count nouns'' are common nouns that can take a [[plural]], can combine with [[numerals]] or [[quantifiers]] (e.g. "one", "two", "several", "every", "most"), and can take an indefinite article ("a" or "an").@@@@1@31@@danf@17-8-2009 10620990@unknown@formal@none@1@S@Examples of count nouns are "chair", "nose", and "occasion".@@@@1@9@@danf@17-8-2009 10621000@unknown@formal@none@1@S@''Mass nouns'' (or ''non-count nouns'') differ from count nouns in precisely that respect: they can't take plural or combine with number words or quantifiers.@@@@1@24@@danf@17-8-2009 10621010@unknown@formal@none@1@S@Examples from English include "laughter", "cutlery", "helium", and "furniture".@@@@1@9@@danf@17-8-2009 10621020@unknown@formal@none@1@S@For example, it is not possible to refer to "a furniture" or "three furnitures".@@@@1@14@@danf@17-8-2009 10621030@unknown@formal@none@1@S@This is true even though the pieces of furniture comprising "furniture" could be counted.@@@@1@14@@danf@17-8-2009 10621040@unknown@formal@none@1@S@Thus the distinction between mass and count nouns shouldn't be made in terms of what sorts of things the nouns ''refer'' to, but rather in terms of how the nouns ''present'' these entities.@@@@1@33@@danf@17-8-2009 10621050@unknown@formal@none@1@S@===Collective nouns===@@@@1@2@@danf@17-8-2009 10621060@unknown@formal@none@1@S@''Collective nouns'' are nouns that refer to ''groups'' consisting of more than one individual or entity, even when they are inflected for the [[Grammatical number|singular]].@@@@1@25@@danf@17-8-2009 10621070@unknown@formal@none@1@S@Examples include "committee", "herd", and "school" (of herring).@@@@1@8@@danf@17-8-2009 10621080@unknown@formal@none@1@S@These nouns have slightly different grammatical properties than other nouns.@@@@1@10@@danf@17-8-2009 10621090@unknown@formal@none@1@S@For example, the [[noun phrases]] that they [[head (syntax)|head]] can serve as the [[subject (grammar)|subject]] of a [[collective predicate]], even when they are inflected for the singular.@@@@1@27@@danf@17-8-2009 10621100@unknown@formal@none@1@S@A [[collective predicate]] is a predicate that normally can't take a singular subject.@@@@1@13@@danf@17-8-2009 10621110@unknown@formal@none@1@S@An example of the latter is "talked to each other".@@@@1@10@@danf@17-8-2009 10621120@unknown@formal@none@1@S@:Good: The '''boys''' talked to each other.@@@@1@7@@danf@17-8-2009 10621130@unknown@formal@none@1@S@:Bad: *The '''boy''' talked to each other.@@@@1@7@@danf@17-8-2009 10621140@unknown@formal@none@1@S@:Good: The '''committee''' talked to each other.@@@@1@7@@danf@17-8-2009 10621150@unknown@formal@none@1@S@===Concrete nouns and abstract nouns===@@@@1@5@@danf@17-8-2009 10621160@unknown@formal@none@1@S@''Concrete nouns'' refer to [[physical bodies]] which you use at least one of your [[sense]]s to observe.@@@@1@17@@danf@17-8-2009 10621170@unknown@formal@none@1@S@For instance, "chair", "apple", or "Janet".@@@@1@6@@danf@17-8-2009 10621180@unknown@formal@none@1@S@''Abstract nouns'' on the other hand refer to [[abstract object]]s, that is ideas or concepts, such as "justice" or "hate".@@@@1@20@@danf@17-8-2009 10621190@unknown@formal@none@1@S@While this distinction is sometimes useful, the boundary between the two of them is not always clear; consider, for example, the noun "art".@@@@1@23@@danf@17-8-2009 10621200@unknown@formal@none@1@S@In English, many abstract nouns are formed by adding noun-forming suffixes ("-ness", "-ity", "-tion") to adjectives or verbs.@@@@1@18@@danf@17-8-2009 10621210@unknown@formal@none@1@S@Examples are "happiness", "circulation" and "serenity".@@@@1@6@@danf@17-8-2009 10621220@unknown@formal@none@1@S@==Nouns and pronouns==@@@@1@3@@danf@17-8-2009 10621230@unknown@formal@none@1@S@[[Noun phrase]]s can typically be replaced by [[pronoun]]s, such as "he", "it", "which", and "those", in order to avoid repetition or explicit identification, or for other reasons.@@@@1@27@@danf@17-8-2009 10621240@unknown@formal@none@1@S@For example, in the sentence "Janet thought that he was weird", the word "he" is a pronoun standing in place of the name of the person in question.@@@@1@28@@danf@17-8-2009 10621250@unknown@formal@none@1@S@The English word ''one'' can replace parts of [[noun phrase]]s, and it sometimes stands in for a noun.@@@@1@18@@danf@17-8-2009 10621260@unknown@formal@none@1@S@An example is given below:@@@@1@5@@danf@17-8-2009 10621270@unknown@formal@none@1@S@: John's car is newer than ''the one'' that Bill has.@@@@1@11@@danf@17-8-2009 10621280@unknown@formal@none@1@S@But ''one'' can also stand in for bigger subparts of a noun phrase.@@@@1@13@@danf@17-8-2009 10621290@unknown@formal@none@1@S@For example, in the following example, ''one'' can stand in for ''new car''.@@@@1@13@@danf@17-8-2009 10621300@unknown@formal@none@1@S@: This new car is cheaper than ''that one''.@@@@1@9@@danf@17-8-2009 10621310@unknown@formal@none@1@S@==Substantive as a word for "noun"==@@@@1@6@@danf@17-8-2009 10621320@unknown@formal@none@1@S@Starting with old [[Latin language|Latin]] grammars, many European languages use some form of the word ''substantive'' as the basic term for noun.@@@@1@22@@danf@17-8-2009 10621330@unknown@formal@none@1@S@Nouns in the dictionaries of such languages are demarked by the abbreviation "s" instead of "n", which may be used for proper nouns instead.@@@@1@24@@danf@17-8-2009 10621340@unknown@formal@none@1@S@This corresponds to those grammars in which nouns and adjectives phase into each other in more areas than, for example, the English term [[Predicative_adjective#Predicative_adjective|predicate adjective]] entails.@@@@1@26@@danf@17-8-2009 10621350@unknown@formal@none@1@S@In French and Spanish, for example, adjectives frequently act as nouns referring to people who have the characteristics of the adjective.@@@@1@21@@danf@17-8-2009 10621360@unknown@formal@none@1@S@An example in English is:@@@@1@5@@danf@17-8-2009 10621370@unknown@formal@none@1@S@: The ''poor'' you have always with you.@@@@1@8@@danf@17-8-2009 10621380@unknown@formal@none@1@S@Similarly, an adjective can also be used for a whole group or organization of people:@@@@1@15@@danf@17-8-2009 10621390@unknown@formal@none@1@S@: The Socialist ''International''.@@@@1@4@@danf@17-8-2009 10621400@unknown@formal@none@1@S@Hence, these words are substantives that are usually adjectives in English.@@@@1@11@@danf@17-8-2009 10630010@unknown@formal@none@1@S@
Ontology (information science)
@@@@1@3@@danf@17-8-2009 10630020@unknown@formal@none@1@S@In both [[computer science]] and [[information science]], an '''ontology''' is a formal representation of a set of concepts within a [[Domain of discourse|domain]] and the relationships between those concepts.@@@@1@29@@danf@17-8-2009 10630030@unknown@formal@none@1@S@It is used to [[Reasoning|reason]] about the properties of that domain, and may be used to define the domain.@@@@1@19@@danf@17-8-2009 10630040@unknown@formal@none@1@S@Ontologies are used in [[artificial intelligence]], the [[Semantic Web]], [[software engineering]], [[biomedical informatics]], [[library science]], and [[information architecture]] as a form of [[knowledge representation]] about the world or some part of it.@@@@1@32@@danf@17-8-2009 10630050@unknown@formal@none@1@S@Common components of ontologies include:@@@@1@5@@danf@17-8-2009 10630060@unknown@formal@none@1@S@* Individuals: instances or objects (the basic or "ground level" objects)@@@@1@11@@danf@17-8-2009 10630070@unknown@formal@none@1@S@* [[Class]]es: [[set (computer science)|set]]s, collections, concepts or types of objects@@@@1@11@@danf@17-8-2009 10630080@unknown@formal@none@1@S@* [[Attribute (computing)|Attribute]]s: properties, features, characteristics, or parameters that objects (and classes) can have@@@@1@14@@danf@17-8-2009 10630090@unknown@formal@none@1@S@* [[Relation (mathematics)|Relations]]: ways that classes and objects can be related to one another@@@@1@14@@danf@17-8-2009 10630100@unknown@formal@none@1@S@* Function terms: complex structures formed from certain relations that can be used in place of an individual term in a statement@@@@1@22@@danf@17-8-2009 10630110@unknown@formal@none@1@S@* Restrictions: formally stated descriptions of what must be true in order for some assertion to be accepted as input@@@@1@20@@danf@17-8-2009 10630120@unknown@formal@none@1@S@* Rules: statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form@@@@1@27@@danf@17-8-2009 10630130@unknown@formal@none@1@S@* Axioms: assertions (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of application.@@@@1@24@@danf@17-8-2009 10630140@unknown@formal@none@1@S@This definition differs from that of "axioms" in generative grammar and formal logic.@@@@1@13@@danf@17-8-2009 10630150@unknown@formal@none@1@S@In these disciplines, axioms include only statements asserted as ''a priori'' knowledge.@@@@1@12@@danf@17-8-2009 10630160@unknown@formal@none@1@S@As used here, "axioms" also include the theory derived from axiomatic statements.@@@@1@12@@danf@17-8-2009 10630170@unknown@formal@none@1@S@* [[Event (philosophy)|Events]]: the changing of attributes or relations@@@@1@9@@danf@17-8-2009 10630180@unknown@formal@none@1@S@Ontologies are commonly encoded using [[ontology language]]s.@@@@1@7@@danf@17-8-2009 10630190@unknown@formal@none@1@S@== Elements ==@@@@1@3@@danf@17-8-2009 10630200@unknown@formal@none@1@S@Contemporary ontologies share many structural similarities, regardless of the language in which they are expressed.@@@@1@15@@danf@17-8-2009 10630210@unknown@formal@none@1@S@As mentioned above, most ontologies describe individuals (instances), classes (concepts), attributes, and relations.@@@@1@13@@danf@17-8-2009 10630220@unknown@formal@none@1@S@In this section each of these components is discussed in turn.@@@@1@11@@danf@17-8-2009 10630230@unknown@formal@none@1@S@=== Individuals ===@@@@1@3@@danf@17-8-2009 10630240@unknown@formal@none@1@S@Individuals (instances) are the basic, "ground level" components of an ontology.@@@@1@11@@danf@17-8-2009 10630250@unknown@formal@none@1@S@The individuals in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract individuals such as numbers and words.@@@@1@28@@danf@17-8-2009 10630260@unknown@formal@none@1@S@Strictly speaking, an ontology need not include any individuals, but one of the general purposes of an ontology is to provide a means of classifying individuals, even if those individuals are not explicitly part of the ontology.@@@@1@37@@danf@17-8-2009 10630270@unknown@formal@none@1@S@In formal extensional ontologies, only the utterances of words and numbers are considered individuals – the numbers and names themselves are classes.@@@@1@22@@danf@17-8-2009 10630280@unknown@formal@none@1@S@In a 4D ontology, an individual is identified by its spatio-temporal extent.@@@@1@12@@danf@17-8-2009 10630290@unknown@formal@none@1@S@Examples of formal extensional ontologies are [[ISO 15926]] and the model in development by the [[IDEAS Group]].@@@@1@17@@danf@17-8-2009 10630300@unknown@formal@none@1@S@=== Classes ===@@@@1@3@@danf@17-8-2009 10630310@unknown@formal@none@1@S@Classes – concepts that are also called ''type'', ''sort'', ''category'', and ''kind'' – are abstract groups, sets, or collections of objects.@@@@1@21@@danf@17-8-2009 10630320@unknown@formal@none@1@S@They may contain individuals, other classes, or a combination of both.@@@@1@11@@danf@17-8-2009 10630330@unknown@formal@none@1@S@Some examples of classes:@@@@1@4@@danf@17-8-2009 10630340@unknown@formal@none@1@S@* ''Person'', the class of all people@@@@1@7@@danf@17-8-2009 10630350@unknown@formal@none@1@S@* ''Vehicle'', the class of all vehicles@@@@1@7@@danf@17-8-2009 10630360@unknown@formal@none@1@S@* ''Car'', the class of all cars@@@@1@7@@danf@17-8-2009 10630370@unknown@formal@none@1@S@* ''Class'', representing the class of all classes@@@@1@8@@danf@17-8-2009 10630380@unknown@formal@none@1@S@* ''Thing'', representing the class of all things@@@@1@8@@danf@17-8-2009 10630390@unknown@formal@none@1@S@Ontologies vary on whether classes can contain other classes, whether a class can belong to itself, whether there is a universal class (that is, a class containing everything), etc.@@@@1@29@@danf@17-8-2009 10630400@unknown@formal@none@1@S@Sometimes restrictions along these lines are made in order to avoid certain well-known [[paradox]]es.@@@@1@14@@danf@17-8-2009 10630410@unknown@formal@none@1@S@The classes of an ontology may be [[extensional]] or [[intensional]] in nature.@@@@1@12@@danf@17-8-2009 10630420@unknown@formal@none@1@S@A class is extensional if and only if it is characterized solely by its membership.@@@@1@15@@danf@17-8-2009 10630430@unknown@formal@none@1@S@More precisely, a class C is extensional if and only if for any class C', if C' has exactly the same members as C, then C and C' are identical.@@@@1@30@@danf@17-8-2009 10630440@unknown@formal@none@1@S@If a class does not satisfy this condition, then it is intensional.@@@@1@12@@danf@17-8-2009 10630450@unknown@formal@none@1@S@While extensional classes are more well-behaved and well-understood mathematically, as well as less problematic philosophically, they do not permit the fine grained distinctions that ontologies often need to make.@@@@1@29@@danf@17-8-2009 10630460@unknown@formal@none@1@S@For example, an ontology may want to distinguish between the class of all creatures with a kidney and the class of all creatures with a heart, even if these classes happen to have exactly the same members.@@@@1@37@@danf@17-8-2009 10630470@unknown@formal@none@1@S@In the upper ontologies mentioned above, the classes are defined intensionally.@@@@1@11@@danf@17-8-2009 10630480@unknown@formal@none@1@S@Intensionally defined classes usually have necessary conditions associated with membership in each class.@@@@1@13@@danf@17-8-2009 10630490@unknown@formal@none@1@S@Some classes may also have sufficient conditions, and in those cases the combination of necessary and sufficient conditions make that class a fully ''defined'' class.@@@@1@25@@danf@17-8-2009 10630500@unknown@formal@none@1@S@Importantly, a class can subsume or be subsumed by other classes; a class subsumed by another is called a ''subclass'' of the subsuming class.@@@@1@24@@danf@17-8-2009 10630510@unknown@formal@none@1@S@For example, ''Vehicle'' subsumes ''Car'', since (necessarily) anything that is a member of the latter class is a member of the former.@@@@1@22@@danf@17-8-2009 10630520@unknown@formal@none@1@S@The subsumption relation is used to create a hierarchy of classes, typically with a maximally general class like ''Thing'' at the top, and very specific classes like ''2002 Ford Explorer'' at the bottom.@@@@1@33@@danf@17-8-2009 10630530@unknown@formal@none@1@S@The critically important consequence of the subsumption relation is the inheritance of properties from the parent (subsuming) class to the child (subsumed) class.@@@@1@23@@danf@17-8-2009 10630540@unknown@formal@none@1@S@Thus, anything that is necessarily true of a parent class is also necessarily true of all of its subsumed child classes.@@@@1@21@@danf@17-8-2009 10630550@unknown@formal@none@1@S@In some ontologies, a class is only allowed to have one parent (''single inheritance''), but in most ontologies, classes are allowed to have any number of parents (''multiple inheritance''), and in the latter case all necessary properties of each parent are inherited by the subsumed child class.@@@@1@47@@danf@17-8-2009 10630560@unknown@formal@none@1@S@Thus a particular class of animal (''HouseCat'') may be a child of the class ''Cat'' and also a child of the class ''Pet''.@@@@1@23@@danf@17-8-2009 10630570@unknown@formal@none@1@S@A partition is a set of related classes and associated rules that allow objects to be placed into the appropriate class.@@@@1@21@@danf@17-8-2009 10630580@unknown@formal@none@1@S@For example, to the right is the partial diagram of an ontology that has a partition of the ''Car'' class into the classes ''2-Wheel Drive'' and ''4-Wheel Drive''.@@@@1@28@@danf@17-8-2009 10630590@unknown@formal@none@1@S@The partition rule determines if a particular car is placed in the ''2-Wheel Drive'' or the ''4-Wheel Drive'' class.@@@@1@19@@danf@17-8-2009 10630600@unknown@formal@none@1@S@If the partition rule(s) guarantee that a single ''Car'' cannot be in both classes, then the partition is called a disjoint partition.@@@@1@22@@danf@17-8-2009 10630610@unknown@formal@none@1@S@If the partition rules ensure that every concrete object in the super-class is an instance of at least one of the partition classes, then the partition is called an exhaustive partition.@@@@1@31@@danf@17-8-2009 10630620@unknown@formal@none@1@S@=== Attributes ===@@@@1@3@@danf@17-8-2009 10630630@unknown@formal@none@1@S@Objects in the ontology can be described by assigning attributes to them.@@@@1@12@@danf@17-8-2009 10630640@unknown@formal@none@1@S@Each attribute has at least a name and a value, and is used to store information that is specific to the object it is attached to.@@@@1@26@@danf@17-8-2009 10630650@unknown@formal@none@1@S@For example the Ford Explorer object has attributes such as:@@@@1@10@@danf@17-8-2009 10630660@unknown@formal@none@1@S@* ''Name'': Ford Explorer@@@@1@4@@danf@17-8-2009 10630670@unknown@formal@none@1@S@* ''Number-of-doors'': 4@@@@1@3@@danf@17-8-2009 10630680@unknown@formal@none@1@S@* ''Engine'': {4.0L, 4.6L}@@@@1@4@@danf@17-8-2009 10630690@unknown@formal@none@1@S@* ''Transmission'': 6-speed@@@@1@3@@danf@17-8-2009 10630700@unknown@formal@none@1@S@The value of an attribute can be a complex [[data type]]; in this example, the value of the attribute called ''Engine'' is a list of values, not just a single value.@@@@1@31@@danf@17-8-2009 10630710@unknown@formal@none@1@S@If you did not define attributes for the concepts you would have either a [[taxonomy]] (if [[hyponym]] relationships exist between concepts) or a '''controlled vocabulary'''.@@@@1@25@@danf@17-8-2009 10630720@unknown@formal@none@1@S@These are useful, but are not considered true ontologies.@@@@1@9@@danf@17-8-2009 10630730@unknown@formal@none@1@S@===Relationships===@@@@1@1@@danf@17-8-2009 10630740@unknown@formal@none@1@S@An important use of attributes is to describe the relationships (also known as relations) between objects in the ontology.@@@@1@19@@danf@17-8-2009 10630750@unknown@formal@none@1@S@Typically a relation is an attribute whose value is another object in the ontology.@@@@1@14@@danf@17-8-2009 10630760@unknown@formal@none@1@S@For example in the ontology that contains the Ford Explorer and the [[Ford Bronco]], the Ford Bronco object might have the following attribute:@@@@1@23@@danf@17-8-2009 10630770@unknown@formal@none@1@S@* ''Successor'': Ford Explorer@@@@1@4@@danf@17-8-2009 10630780@unknown@formal@none@1@S@This tells us that the Explorer is the model that replaced the Bronco.@@@@1@13@@danf@17-8-2009 10630790@unknown@formal@none@1@S@Much of the power of ontologies comes from the ability to describe these relations.@@@@1@14@@danf@17-8-2009 10630800@unknown@formal@none@1@S@Together, the set of relations describes the [[semantics]] of the domain.@@@@1@11@@danf@17-8-2009 10630810@unknown@formal@none@1@S@The most important type of relation is the [[subsumption]] relation (''is-[[superclass]]-of'', the converse of ''[[is-a]]'', ''is-subtype-of'' or ''is-[[subclass]]-of'').@@@@1@18@@danf@17-8-2009 10630820@unknown@formal@none@1@S@This defines which objects are members of classes of objects.@@@@1@10@@danf@17-8-2009 10630830@unknown@formal@none@1@S@For example we have already seen that the Ford Explorer ''is-a'' 4-wheel drive, which in turn ''is-a'' Car:@@@@1@18@@danf@17-8-2009 10630840@unknown@formal@none@1@S@The addition of the is-a relationships has created a hierarchical [[taxonomy]]; a tree-like structure (or, more generally, a [[partially ordered set]]) that clearly depicts how objects relate to one another.@@@@1@30@@danf@17-8-2009 10630850@unknown@formal@none@1@S@In such a structure, each object is the 'child' of a 'parent class' (Some languages restrict the is-a relationship to one parent for all nodes, but many do not).@@@@1@29@@danf@17-8-2009 10630860@unknown@formal@none@1@S@Another common type of relations is the [[meronymy]] relation, written as ''part-of'', that represents how objects combine together to form composite objects.@@@@1@22@@danf@17-8-2009 10630870@unknown@formal@none@1@S@For example, if we extended our example ontology to include objects like Steering Wheel, we would say that "Steering Wheel is-part-of Ford Explorer" since a steering wheel is one of the components of a Ford Explorer.@@@@1@36@@danf@17-8-2009 10630880@unknown@formal@none@1@S@If we introduce meronymy relationships to our ontology, we find that this simple and elegant tree structure quickly becomes complex and significantly more difficult to interpret manually.@@@@1@27@@danf@17-8-2009 10630890@unknown@formal@none@1@S@It is not difficult to understand why; an entity that is described as 'part of' another entity might also be 'part of' a third entity.@@@@1@25@@danf@17-8-2009 10630900@unknown@formal@none@1@S@Consequently, entities may have more than one parent.@@@@1@8@@danf@17-8-2009 10630910@unknown@formal@none@1@S@The structure that emerges is known as a [[directed acyclic graph]] (DAG).@@@@1@12@@danf@17-8-2009 10630920@unknown@formal@none@1@S@As well as the standard is-a and part-of relations, ontologies often include additional types of relation that further refine the semantics they model.@@@@1@23@@danf@17-8-2009 10630930@unknown@formal@none@1@S@These relations are often domain-specific and are used to answer particular types of question.@@@@1@14@@danf@17-8-2009 10630940@unknown@formal@none@1@S@For example in the domain of automobiles, we might define a ''made-in'' relationship which tells us where each car is built.@@@@1@21@@danf@17-8-2009 10630950@unknown@formal@none@1@S@So the Ford Explorer is ''made-in'' [[Louisville, Kentucky|Louisville]].@@@@1@8@@danf@17-8-2009 10630960@unknown@formal@none@1@S@The ontology may also know that Louisville is-in [[Kentucky]] and Kentucky is-a state of the [[United States|USA]].@@@@1@17@@danf@17-8-2009 10630970@unknown@formal@none@1@S@Software using this ontology could now answer a question like "which cars are made in the U.S.?"@@@@1@17@@danf@17-8-2009 10630980@unknown@formal@none@1@S@== Domain ontologies and upper ontologies ==@@@@1@7@@danf@17-8-2009 10630990@unknown@formal@none@1@S@A domain ontology (or domain-specific ontology) models a specific domain, or part of the world.@@@@1@15@@danf@17-8-2009 10631000@unknown@formal@none@1@S@It represents the particular meanings of terms as they apply to that domain.@@@@1@13@@danf@17-8-2009 10631010@unknown@formal@none@1@S@For example the word ''[[card]]'' has many different meanings.@@@@1@9@@danf@17-8-2009 10631020@unknown@formal@none@1@S@An ontology about the domain of [[poker]] would model the "[[playing card]]" meaning of the word, while an ontology about the domain of [[computer hardware]] would model the "[[punch card]]" and "[[video card]]" meanings.@@@@1@34@@danf@17-8-2009 10631030@unknown@formal@none@1@S@An [[Upper ontology (computer science)|upper ontology]] (or foundation ontology) is a model of the common objects that are generally applicable across a wide range of domain ontologies.@@@@1@27@@danf@17-8-2009 10631040@unknown@formal@none@1@S@It contains a [[core glossary]] in whose terms objects in a set of domains can be described.@@@@1@17@@danf@17-8-2009 10631050@unknown@formal@none@1@S@There are several standardized upper ontologies available for use, including [[Dublin Core]], [[General Formal Ontology|GFO]], [[OpenCyc]]/[[ResearchCyc]], [[Suggested Upper Merged Ontology|SUMO]], and [http://www.loa-cnr.it/DOLCE.html DOLCE]l.@@@@1@23@@danf@17-8-2009 10631060@unknown@formal@none@1@S@[[WordNet]], while considered an upper ontology by some, is not an ontology: it is a unique combination of a [[taxonomy]] and a controlled vocabulary (see above, under Attributes).@@@@1@28@@danf@17-8-2009 10631070@unknown@formal@none@1@S@The [[Gellish]] ontology is an example of a combination of an upper and a domain ontology.@@@@1@16@@danf@17-8-2009 10631080@unknown@formal@none@1@S@Since domain ontologies represent concepts in very specific and often eclectic ways, they are often incompatible.@@@@1@16@@danf@17-8-2009 10631090@unknown@formal@none@1@S@As systems that rely on domain ontologies expand, they often need to merge domain ontologies into a more general representation.@@@@1@20@@danf@17-8-2009 10631100@unknown@formal@none@1@S@This presents a challenge to the ontology designer.@@@@1@8@@danf@17-8-2009 10631110@unknown@formal@none@1@S@Different ontologies in the same domain can also arise due to different perceptions of the domain based on cultural background, education, ideology, or because a different representation language was chosen.@@@@1@30@@danf@17-8-2009 10631120@unknown@formal@none@1@S@At present, merging ontologies is a largely manual process and therefore time-consuming and expensive.@@@@1@14@@danf@17-8-2009 10631130@unknown@formal@none@1@S@Using a foundation ontology to provide a common definition of core terms can make this process manageable.@@@@1@17@@danf@17-8-2009 10631140@unknown@formal@none@1@S@There are studies on generalized techniques for merging ontologies, but this area of research is still largely theoretical.@@@@1@18@@danf@17-8-2009 10631150@unknown@formal@none@1@S@== Ontology languages ==@@@@1@4@@danf@17-8-2009 10631160@unknown@formal@none@1@S@An [[ontology language]] is a [[formal language]] used to encode the ontology.@@@@1@12@@danf@17-8-2009 10631170@unknown@formal@none@1@S@There are a number of such languages for ontologies, both proprietary and standards-based:@@@@1@13@@danf@17-8-2009 10631180@unknown@formal@none@1@S@* [[Web Ontology Language|OWL]] is a language for making ontological statements, developed as a follow-on from [[Resource Description Framework|RDF]] and [[RDFS]], as well as earlier ontology language projects including [[Ontology Inference Layer|OIL]], [[DARPA Agent Markup Language|DAML]] and [[DAMLplusOIL|DAML+OIL]].@@@@1@38@@danf@17-8-2009 10631190@unknown@formal@none@1@S@OWL is intended to be used over the [[World Wide Web]], and all its elements (classes, properties and individuals) are defined as RDF [[resource (Web)|resources]], and identified by [[Uniform Resource Identifier|URI]]s.@@@@1@31@@danf@17-8-2009 10631200@unknown@formal@none@1@S@* [[KIF]] is a syntax for [[first-order logic]] that is based on [[S-expression]]s.@@@@1@13@@danf@17-8-2009 10631210@unknown@formal@none@1@S@* The [[Cyc]] project has its own ontology language called [[CycL]], based on [[first-order predicate calculus]] with some higher-order extensions.@@@@1@20@@danf@17-8-2009 10631220@unknown@formal@none@1@S@* [[Rule Interchange Format]] (RIF) and [[F-Logic]] combine ontologies and rules.@@@@1@11@@danf@17-8-2009 10631230@unknown@formal@none@1@S@* The [[Gellish]] language includes rules for its own extension and thus integrates an ontology with an ontology language.@@@@1@19@@danf@17-8-2009 10631240@unknown@formal@none@1@S@== Relation to the philosophical term ==@@@@1@7@@danf@17-8-2009 10631250@unknown@formal@none@1@S@The term ''ontology'' has its origin in [[ontology|philosophy]], where it is the name of one fundamental branch of [[metaphysics]], concerned with analyzing various types or modes of ''existence'', often with special attention to the relations between particulars and universals, between intrinsic and extrinsic properties, and between essence and existence.@@@@1@49@@danf@17-8-2009 10631260@unknown@formal@none@1@S@According to [[Tom Gruber]] at [[Stanford University]], the meaning of ''ontology'' in the context of computer science is “a description of the concepts and relationships that can exist for an [[Software agent|agent]] or a community of agents.”@@@@1@37@@danf@17-8-2009 10631270@unknown@formal@none@1@S@He goes on to specify that an ontology is generally written, “as a set of definitions of formal vocabulary.”@@@@1@19@@danf@17-8-2009 10631280@unknown@formal@none@1@S@What ontology has in common in both computer science and philosophy is the representation of entities, ideas, and events, along with their properties and relations, according to a system of categories.@@@@1@31@@danf@17-8-2009 10631290@unknown@formal@none@1@S@In both fields, one finds considerable work on problems of ontological relativity (e.g. [[Quine]] and [[Kripke]] in philosophy, [[John F. Sowa|Sowa]] and [[Nicola Guarino|Guarino]] in computer science (Top-level ontological categories.@@@@1@30@@danf@17-8-2009 10631310@unknown@formal@none@1@S@By: Sowa, John F.@@@@1@4@@danf@17-8-2009 10631320@unknown@formal@none@1@S@In International Journal of Human-Computer Studies, v. 43 (November/December 1995) p. 669-85.), and debates concerning whether a normative ontology is viable (e.g. debates over [[foundationalism]] in philosophy, debates over the [[Cyc]] project in AI).@@@@1@34@@danf@17-8-2009 10631330@unknown@formal@none@1@S@Differences between the two are largely matters of focus.@@@@1@9@@danf@17-8-2009 10631340@unknown@formal@none@1@S@Philosophers are less concerned with establishing fixed, controlled vocabularies than are researchers in computer science, while computer scientists are less involved in discussions of first principles (such as debating whether there are such things as fixed essences, or whether entities must be ontologically more primary than processes).@@@@1@47@@danf@17-8-2009 10631350@unknown@formal@none@1@S@During the second half of the 20th century, philosophers extensively debated the possible methods or approaches to building ontologies, without actually ''building'' any very elaborate ontologies themselves.@@@@1@27@@danf@17-8-2009 10631360@unknown@formal@none@1@S@By contrast, computer scientists were building some large and robust ontologies (such as [[WordNet]] and [[Cyc]]) with comparatively little debate over ''how'' they were built.@@@@1@25@@danf@17-8-2009 10631370@unknown@formal@none@1@S@In the early years of the 21st century, the interdisciplinary project of [[cognitive science]] has been bringing the two circles of scholars closer together.@@@@1@24@@danf@17-8-2009 10631380@unknown@formal@none@1@S@For example, there is talk of a "computational turn in philosophy" which includes philosophers analyzing the formal ontologies of computer science (sometimes even working directly with the software), while researchers in computer science have been making more references to those philosophers who work on ontology (sometimes with direct consequences for their methods).@@@@1@52@@danf@17-8-2009 10631390@unknown@formal@none@1@S@Still, many scholars in both fields are uninvolved in this trend of cognitive science, and continue to work independently of one another, pursuing separately their different concerns.@@@@1@27@@danf@17-8-2009 10631400@unknown@formal@none@1@S@==Resources==@@@@1@1@@danf@17-8-2009 10631410@unknown@formal@none@1@S@===Examples of published ontologies ===@@@@1@5@@danf@17-8-2009 10631420@unknown@formal@none@1@S@* [[Dublin Core]], a simple ontology for documents and publishing.@@@@1@10@@danf@17-8-2009 10631430@unknown@formal@none@1@S@* [[Cyc]] for formal representation of the universe of discourse.@@@@1@10@@danf@17-8-2009 10631440@unknown@formal@none@1@S@* [[Suggested Upper Merged Ontology]], which is a formal upper ontology@@@@1@11@@danf@17-8-2009 10631450@unknown@formal@none@1@S@* [http://www.ifomis.org/bfo/ Basic Formal Ontology (BFO)], a formal upper ontology designed to support scientific research@@@@1@15@@danf@17-8-2009 10631460@unknown@formal@none@1@S@* [[Gellish English dictionary]], an ontology that includes a dictionary and taxonomy that includes an upper ontology and a lower ontology that focusses on industrial and business applications in engineering, technology and procurement.@@@@1@33@@danf@17-8-2009 10631470@unknown@formal@none@1@S@* [http://www.fb10.uni-bremen.de/anglistik/langpro/webspace/jb/gum/index.htm Generalized Upper Model], a linguistically-motivated ontology for mediating between clients systems and natural language technology@@@@1@17@@danf@17-8-2009 10631480@unknown@formal@none@1@S@* [[WordNet]] Lexical reference system@@@@1@5@@danf@17-8-2009 10631490@unknown@formal@none@1@S@* [[OBO Foundry]]: a suite of interoperable reference ontologies in biomedicine.@@@@1@11@@danf@17-8-2009 10631500@unknown@formal@none@1@S@* The [[Ontology for Biomedical Investigations]] is an open access, integrated ontology for the description of biological and clinical investigations.@@@@1@20@@danf@17-8-2009 10631510@unknown@formal@none@1@S@* [http://colab.cim3.net/file/work/SICoP/ontac/COSMO/ COSMO]: An OWL ontology that is a merger of the basic elements of the OpenCyc and SUMO ontologies, with additional elements.@@@@1@23@@danf@17-8-2009 10631520@unknown@formal@none@1@S@* [[Gene Ontology]] for [[genomics]]@@@@1@5@@danf@17-8-2009 10631530@unknown@formal@none@1@S@* [http://pir.georgetown.edu/pro/ PRO], the Protein Ontology of the Protein Information Resource, Georgetown University.@@@@1@13@@danf@17-8-2009 10631540@unknown@formal@none@1@S@* [http://proteinontology.info/ Protein Ontology] for [[proteomics]]@@@@1@6@@danf@17-8-2009 10631550@unknown@formal@none@1@S@* [http://sig.biostr.washington.edu/projects/fm/AboutFM.html Foundational Model of Anatomy] for human anatomy@@@@1@9@@danf@17-8-2009 10631560@unknown@formal@none@1@S@* [[SBO]], the Systems Biology Ontology, for computational models in biology@@@@1@11@@danf@17-8-2009 10631570@unknown@formal@none@1@S@* [http://www.plantontology.org/ Plant Ontology] for plant structures and growth/development stages, etc.@@@@1@11@@danf@17-8-2009 10631580@unknown@formal@none@1@S@* [[CIDOC|CIDOC CRM]] (Conceptual Reference Model) - an ontology for "[[cultural heritage]] information".@@@@1@13@@danf@17-8-2009 10631590@unknown@formal@none@1@S@* [http://www.linguistics-ontology.org/gold.html GOLD ] ('''G'''eneral '''O'''ntology for [[descriptive linguistics|'''L'''inguistic '''D'''escription ]])@@@@1@11@@danf@17-8-2009 10631600@unknown@formal@none@1@S@* [http://www.landcglobal.com/pages/linkbase.php Linkbase] A formal representation of the biomedical domain, founded upon [http://www.ifomis.org/bfo/ Basic Formal Ontology (BFO)].@@@@1@17@@danf@17-8-2009 10631610@unknown@formal@none@1@S@* [http://www.loa-cnr.it/Ontologies.html Foundational, Core and Linguistic Ontologies]@@@@1@7@@danf@17-8-2009 10631620@unknown@formal@none@1@S@* [[ThoughtTreasure]] ontology@@@@1@3@@danf@17-8-2009 10631630@unknown@formal@none@1@S@* [[LPL]] Lawson Pattern Language@@@@1@5@@danf@17-8-2009 10631640@unknown@formal@none@1@S@* [[TIME-ITEM]] Topics for Indexing Medical Education@@@@1@7@@danf@17-8-2009 10631650@unknown@formal@none@1@S@* [[POPE]] Purdue Ontology for Pharmaceutical Engineering@@@@1@7@@danf@17-8-2009 10631660@unknown@formal@none@1@S@* [[IDEAS Group]] A formal ontology for enterprise architecture being developed by the Australian, Canadian, UK and U.S. Defence Depts. [http://www.ideasgroup.org The IDEAS Group Website]@@@@1@25@@danf@17-8-2009 10631670@unknown@formal@none@1@S@* [http://www.eden-study.org/articles/2007/problems-ontology-programs_ao.pdf program abstraction taxonomy]@@@@1@5@@danf@17-8-2009 10631680@unknown@formal@none@1@S@* [http://sweet.jpl.nasa.gov/ SWEET] Semantic Web for Earth and Environmental Terminology@@@@1@10@@danf@17-8-2009 10631690@unknown@formal@none@1@S@* [http://www.cellcycleontology.org/ CCO] The Cell-Cycle Ontology is an application ontology that represents the cell cycle@@@@1@15@@danf@17-8-2009 10631700@unknown@formal@none@1@S@===Ontology libraries===@@@@1@2@@danf@17-8-2009 10631710@unknown@formal@none@1@S@The development of ontologies for the Web has led to the apparition of services providing lists or directories of ontologies with search facility.@@@@1@23@@danf@17-8-2009 10631720@unknown@formal@none@1@S@Such directories have been called ontology libraries.@@@@1@7@@danf@17-8-2009 10631730@unknown@formal@none@1@S@The following are static libraries of human-selected ontologies.@@@@1@8@@danf@17-8-2009 10631740@unknown@formal@none@1@S@* The [http://www.daml.org/ontologies/ DAML Ontology Library] maintains a legacy of ontologies in DAML.@@@@1@13@@danf@17-8-2009 10631750@unknown@formal@none@1@S@* The [http://protegewiki.stanford.edu/index.php/Protege_Ontology_Library Protege Ontology Library] contains a set of owl, Frame-based and other format ontologies.@@@@1@16@@danf@17-8-2009 10631760@unknown@formal@none@1@S@* [http://www.schemaweb.info/ SchemaWeb] is a directory of RDF schemata expressed in RDFS, OWL and DAML+OIL.@@@@1@15@@danf@17-8-2009 10631770@unknown@formal@none@1@S@The following are both directories and search engines.@@@@1@8@@danf@17-8-2009 10631780@unknown@formal@none@1@S@They include crawlers searching the Web for well-formed ontologies.@@@@1@9@@danf@17-8-2009 10631790@unknown@formal@none@1@S@* [[Swoogle]] is a directory and search engine for all RDF resources available on the Web, including ontologies.@@@@1@18@@danf@17-8-2009 10631800@unknown@formal@none@1@S@* The [http://olp.dfki.de/OntoSelect/ OntoSelect] Ontology Library offers similar services for RDF/S, DAML and OWL ontologies.@@@@1@15@@danf@17-8-2009 10631810@unknown@formal@none@1@S@* [http://www.w3.org/2004/ontaria/ Ontaria] is a "searchable and browsable directory of semantic web data", with a focus on RDF vocabularies with OWL ontologies.@@@@1@22@@danf@17-8-2009 10631820@unknown@formal@none@1@S@* The [http://www.obofoundry.org/ OBO Foundry / Bioportal]is a suite of interoperable reference ontologies in biology and biomedicine.@@@@1@17@@danf@17-8-2009 10640010@unknown@formal@none@1@S@
OpenOffice.org
@@@@1@1@@danf@17-8-2009 10640020@unknown@formal@none@1@S@'''OpenOffice.org''' ('''OO.o''' or '''OOo''') is a [[cross-platform]] [[office suite|office application suite]] available for a number of different computer [[operating system]]s.@@@@1@20@@danf@17-8-2009 10640030@unknown@formal@none@1@S@It supports the ISO standard '''[[OpenDocument]] Format (ODF)''' for data interchange as its default [[file format]], as well as [[Microsoft Office]] '97–2003 formats, [[Microsoft Office]] '2007 format (in version 3), among many others.@@@@1@33@@danf@17-8-2009 10640040@unknown@formal@none@1@S@OpenOffice.org was originally derived from [[StarOffice]], an office suite developed by [[StarDivision]] and acquired by [[Sun Microsystems]] in August 1999.@@@@1@20@@danf@17-8-2009 10640050@unknown@formal@none@1@S@The [[source code]] of the suite was released in July 2000 with the aim of reducing the dominant [[market share]] of [[Microsoft Office]] by providing a free, open and high-quality alternative; later versions of StarOffice are based upon OpenOffice.org with additional proprietary components.@@@@1@43@@danf@17-8-2009 10640060@unknown@formal@none@1@S@OpenOffice.org is [[free software]], available under the [[GNU Lesser General Public License]] (LGPL).@@@@1@13@@danf@17-8-2009 10640070@unknown@formal@none@1@S@The project and software are informally referred to as ''OpenOffice'', but this term is a [[trademark]] held by another party, requiring the project to adopt ''OpenOffice.org'' as its formal name.@@@@1@30@@danf@17-8-2009 10640080@unknown@formal@none@1@S@== History==@@@@1@2@@danf@17-8-2009 10640090@unknown@formal@none@1@S@Originally developed as the [[proprietary software]] application suite StarOffice by the German company [[StarDivision]], the code was purchased in 1999 by Sun Microsystems.@@@@1@23@@danf@17-8-2009 10640100@unknown@formal@none@1@S@In August 1999 version 5.2 of StarOffice was made available free of charge.@@@@1@13@@danf@17-8-2009 10640110@unknown@formal@none@1@S@On [[July 19]], [[2000]], Sun Microsystems announced that it was making the source code of StarOffice available for download under both the LGPL and the [[Sun Industry Standards Source License]] (SISSL) with the intention of building an open source development community around the software.@@@@1@44@@danf@17-8-2009 10640120@unknown@formal@none@1@S@The new project was known as OpenOffice.org, and its website went live on [[October 13]], [[2000]].@@@@1@16@@danf@17-8-2009 10640130@unknown@formal@none@1@S@Work on version 2.0 began in early 2003 with the following goals: better interoperability with Microsoft Office; better performance, with improved speed and lower memory usage; greater [[Scripting language|scripting]] capabilities; better integration, particularly with [[GNOME]]; an easier-to-find and use database front-end for creating reports, forms and queries; a new built-in [[SQL]] database; and improved [[usability]].@@@@1@55@@danf@17-8-2009 10640140@unknown@formal@none@1@S@A [[beta version]] was released on [[March 4]], [[2005]].@@@@1@9@@danf@17-8-2009 10640150@unknown@formal@none@1@S@On [[September 2]], [[2005]] Sun announced that it was retiring the SISSL.@@@@1@12@@danf@17-8-2009 10640160@unknown@formal@none@1@S@As a consequence, the OpenOffice.org Community Council announced that it would no longer [[dual license]] the office suite, and future versions would use only the LGPL.@@@@1@26@@danf@17-8-2009 10640170@unknown@formal@none@1@S@On [[October 20]], [[2005]], OpenOffice.org 2.0 was formally released to the public.@@@@1@12@@danf@17-8-2009 10640180@unknown@formal@none@1@S@Eight weeks after the release of Version 2.0, an update, OpenOffice.org 2.0.1, was released.@@@@1@14@@danf@17-8-2009 10640190@unknown@formal@none@1@S@It fixed minor bugs and introduced new features.@@@@1@8@@danf@17-8-2009 10640200@unknown@formal@none@1@S@As of the 2.0.3 release, OpenOffice.org changed its release cycle from 18-months to releasing updates, feature enhancements and bug fixes every three months.@@@@1@23@@danf@17-8-2009 10640210@unknown@formal@none@1@S@Currently, new versions including new features are released every six months (so-called "feature releases") alternating with so-called "bug fix releases" which are being released between two feature releases (Every 3 months).@@@@1@31@@danf@17-8-2009 10640220@unknown@formal@none@1@S@=== StarOffice ===@@@@1@3@@danf@17-8-2009 10640230@unknown@formal@none@1@S@Sun subsidizes the development of OpenOffice.org in order to use it as a base for its commercial [[proprietary software|proprietary]] StarOffice application software.@@@@1@22@@danf@17-8-2009 10640240@unknown@formal@none@1@S@Releases of StarOffice since version 6.0 have been based on the OpenOffice.org source code, with some additional proprietary components, including:@@@@1@20@@danf@17-8-2009 10640250@unknown@formal@none@1@S@* Additional bundled fonts (especially [[CJK|East Asian language]] fonts).@@@@1@9@@danf@17-8-2009 10640260@unknown@formal@none@1@S@* [[Adabas D]] database.@@@@1@4@@danf@17-8-2009 10640270@unknown@formal@none@1@S@* Additional document [[Template (word processing)|templates]].@@@@1@6@@danf@17-8-2009 10640280@unknown@formal@none@1@S@* [[Clip art]].@@@@1@3@@danf@17-8-2009 10640290@unknown@formal@none@1@S@* Sorting functionality for Asian versions.@@@@1@6@@danf@17-8-2009 10640300@unknown@formal@none@1@S@* Additional file filters.@@@@1@4@@danf@17-8-2009 10640310@unknown@formal@none@1@S@* Migration assessment tool (Enterprise Edition).@@@@1@6@@danf@17-8-2009 10640320@unknown@formal@none@1@S@* Macro migration tool (Enterprise Edition).@@@@1@6@@danf@17-8-2009 10640330@unknown@formal@none@1@S@* Configuration management tool (Enterprise Edition).@@@@1@6@@danf@17-8-2009 10640340@unknown@formal@none@1@S@OpenOffice.org, therefore, inherited many features from the original StarOffice upon which it was based including the [[OpenOffice.org XML]] file format which it retained until version 2, when it was replaced by the ISO standard [[OpenDocument]] Format (ODF).@@@@1@37@@danf@17-8-2009 10640350@unknown@formal@none@1@S@== Features ==@@@@1@3@@danf@17-8-2009 10640360@unknown@formal@none@1@S@According to its [[mission statement]], the OpenOffice.org project aims "''To create, as a community, the leading international office suite that will run on all major platforms and provide access to all functionality and data through open-component based APIs and an XML-based file format.''"@@@@1@43@@danf@17-8-2009 10640370@unknown@formal@none@1@S@OpenOffice.org aims to compete with Microsoft Office and emulate its look and feel where suitable.@@@@1@15@@danf@17-8-2009 10640380@unknown@formal@none@1@S@It can read and write most of the [[file formats]] found in Microsoft Office, and many other applications; an essential feature of the suite for many users.@@@@1@27@@danf@17-8-2009 10640390@unknown@formal@none@1@S@OpenOffice.org has been found to be able to open files of older versions of Microsoft Office and damaged files that newer versions of Microsoft Office itself cannot open.@@@@1@28@@danf@17-8-2009 10640400@unknown@formal@none@1@S@However, it cannot open older Word for Macintosh (MCW) files.@@@@1@10@@danf@17-8-2009 10640410@unknown@formal@none@1@S@=== Platforms ===@@@@1@3@@danf@17-8-2009 10640420@unknown@formal@none@1@S@Platforms for which OO.o is available include [[Microsoft Windows]], [[Linux]], [[Solaris Operating System|Solaris]], [[BSD]], [[OpenVMS]], [[OS/2]] and [[IRIX]].@@@@1@18@@danf@17-8-2009 10640430@unknown@formal@none@1@S@The current primary development platforms are Microsoft Windows, Linux and Solaris.@@@@1@11@@danf@17-8-2009 10640440@unknown@formal@none@1@S@A port for [[Mac OS X]] exists for OS X machines which have the [[X Window System]] component installed.@@@@1@19@@danf@17-8-2009 10640450@unknown@formal@none@1@S@A port to OS X's native [[Aqua (user interface)|Aqua user interface]] is in progress, and is scheduled for completion for the 3.0 milestone.@@@@1@23@@danf@17-8-2009 10640460@unknown@formal@none@1@S@[[NeoOffice]] is an independent [[Fork (software development)|fork]] of OpenOffice, specially adapted for Mac OS X.@@@@1@15@@danf@17-8-2009 10640470@unknown@formal@none@1@S@=== Version compatibility ===@@@@1@4@@danf@17-8-2009 10640480@unknown@formal@none@1@S@*Windows 95: up to v1.1.5@@@@1@5@@danf@17-8-2009 10640490@unknown@formal@none@1@S@*Windows 98-Vista: up to v2.4, development releases of v3.0@@@@1@9@@danf@17-8-2009 10640500@unknown@formal@none@1@S@*Mac OS 10.2: up to v1.1.2@@@@1@6@@danf@17-8-2009 10640510@unknown@formal@none@1@S@*Mac OS 10.3: up to v2.1@@@@1@6@@danf@17-8-2009 10640520@unknown@formal@none@1@S@*Mac OS 10.4-10.5: up to v2.4, development releases of v3.0 ([[Apple-Intel architecture|intel]] only)@@@@1@13@@danf@17-8-2009 10640530@unknown@formal@none@1@S@*OS/2 and eComStation: up to v2.0.4@@@@1@6@@danf@17-8-2009 10640540@unknown@formal@none@1@S@=== Components ===@@@@1@3@@danf@17-8-2009 10640550@unknown@formal@none@1@S@OpenOffice.org is a collection of applications that work together closely to provide the features expected from a modern office suite.@@@@1@20@@danf@17-8-2009 10640560@unknown@formal@none@1@S@Many of the components are designed to mirror those available in Microsoft Office.@@@@1@13@@danf@17-8-2009 10640570@unknown@formal@none@1@S@The components available include:@@@@1@4@@danf@17-8-2009 10640580@unknown@formal@none@1@S@*[[QuickStart]]er@@@@1@1@@danf@17-8-2009 10640590@unknown@formal@none@1@S@:A small program for Windows and Linux that runs when the computer starts for the first time.@@@@1@17@@danf@17-8-2009 10640600@unknown@formal@none@1@S@It loads the core files and libraries for OpenOffice.org during computer startup and allows the suite applications to start more quickly when selected later.@@@@1@24@@danf@17-8-2009 10640610@unknown@formal@none@1@S@The amount of time it takes to open OpenOffice.org applications was a common complaint in version 1.0 of the suite.@@@@1@20@@danf@17-8-2009 10640620@unknown@formal@none@1@S@Substantial improvements were made in this area for version 2.2.@@@@1@10@@danf@17-8-2009 10640630@unknown@formal@none@1@S@*The [[Macro (computer science)|macro]] recorder@@@@1@5@@danf@17-8-2009 10640640@unknown@formal@none@1@S@:Is used to record user actions and replay them later to help with automating tasks, using [[OpenOffice.org Basic]] (see [[OpenOffice.org#OpenOffice.org Basic|below]]).@@@@1@21@@danf@17-8-2009 10640650@unknown@formal@none@1@S@It is not possible to download these components individually on Windows, though they can be installed separately.@@@@1@17@@danf@17-8-2009 10640660@unknown@formal@none@1@S@Most Linux distributions break the components into individual packages which may be downloaded and installed separately.@@@@1@16@@danf@17-8-2009 10640670@unknown@formal@none@1@S@=== OpenOffice.org Basic ===@@@@1@4@@danf@17-8-2009 10640680@unknown@formal@none@1@S@OpenOffice.org Basic is a programming language similar to Microsoft [[Visual Basic for Applications]] (VBA) based on [[StarOffice Basic]].@@@@1@18@@danf@17-8-2009 10640690@unknown@formal@none@1@S@In addition to the macros, the upcoming Novell edition of OpenOffice.org 2.0 supports running Microsoft VBA macros, a feature expected to be incorporated into the mainstream version soon.@@@@1@28@@danf@17-8-2009 10640700@unknown@formal@none@1@S@OpenOffice.org Basic is available in the Writer and Calc applications.@@@@1@10@@danf@17-8-2009 10640710@unknown@formal@none@1@S@It is written in functions called subroutines or macros, with each macro performing a different task, such as counting the words in a paragraph.@@@@1@24@@danf@17-8-2009 10640720@unknown@formal@none@1@S@OpenOffice.org Basic is especially useful in doing repetitive tasks that have not been integrated in the program.@@@@1@17@@danf@17-8-2009 10640730@unknown@formal@none@1@S@As the OpenOffice.org database, called "Base", uses documents created under the Writer application for reports and forms, one could say that Base can also be programmed with OpenOffice.org Basic.@@@@1@29@@danf@17-8-2009 10640740@unknown@formal@none@1@S@== File formats ==@@@@1@4@@danf@17-8-2009 10640750@unknown@formal@none@1@S@OpenOffice.org pioneered the ISO/IEC standard [[OpenDocument]] file formats (ODF), which it uses natively, by default.@@@@1@15@@danf@17-8-2009 10640760@unknown@formal@none@1@S@It also supports reading (and in some cases writing) a large number of legacy proprietary file formats (e.g.: [[WordPerfect]] through libwpd, [[StarOffice]], [[Lotus software]], [[Microsoft Works|MS Works]] through libwps, [[Rich Text Format]]), most notably including [[Microsoft Office]] formats after which the OpenDocument specification was "approved for release as an ISO and IEC International Standard" under the name ISO/IEC 26300:2006..@@@@1@59@@danf@17-8-2009 10640770@unknown@formal@none@1@S@=== Microsoft Office interoperability ===@@@@1@5@@danf@17-8-2009 10640780@unknown@formal@none@1@S@In response to Microsoft's recent movement towards using the [[Office Open XML]] format in [[Microsoft Office 2007]], [[Novell]] has released an [[Office Open XML]] converter for OOo under a liberal [[BSD license]] (along with [[GNU GPL]] and [[LGPL]] licensed libraries), that will be submitted for inclusion into the OpenOffice.org project.@@@@1@50@@danf@17-8-2009 10640790@unknown@formal@none@1@S@This allows OOo to read and write Microsoft OpenXML-formatted word processing documents (.docx) in OpenOffice.org.@@@@1@15@@danf@17-8-2009 10640800@unknown@formal@none@1@S@Currently it works only with the latest Novell edition of OpenOffice.org.@@@@1@11@@danf@17-8-2009 10640810@unknown@formal@none@1@S@[[Sun Microsystems]] has developed an ODF plugin for Microsoft Office which enables users of Microsoft Office Word, Excel and PowerPoint to read and write ODF documents.@@@@1@26@@danf@17-8-2009 10640820@unknown@formal@none@1@S@The plugin currently works with Microsoft Office 2003, Microsoft Office XP and Microsoft Office 2000.@@@@1@15@@danf@17-8-2009 10640830@unknown@formal@none@1@S@Support for Microsoft Office 2007 is only available in combination with Microsoft Office 2007 SP1.@@@@1@15@@danf@17-8-2009 10640840@unknown@formal@none@1@S@Several software companies (including Microsoft and Novell) are working on an add-in for Microsoft Office that allows reading and writing ODF files.@@@@1@22@@danf@17-8-2009 10640850@unknown@formal@none@1@S@Currently it works only for Microsoft Word 2007 / XP / 2003.@@@@1@12@@danf@17-8-2009 10640860@unknown@formal@none@1@S@Microsoft provides a compatibility pack to read and write Office Open XML files with Office 2000, XP and 2003.@@@@1@19@@danf@17-8-2009 10640870@unknown@formal@none@1@S@The compatibility pack can also be used as a stand-alone converter with Microsoft Office 97.@@@@1@15@@danf@17-8-2009 10640880@unknown@formal@none@1@S@This might be helpful to convert older Microsoft Office files via Office Open XML to ODF if a direct conversion doesn't work as expected.@@@@1@24@@danf@17-8-2009 10640890@unknown@formal@none@1@S@The Office compatibility pack however does not install for Office 2000 or Office XP on [[Windows 9x]].@@@@1@17@@danf@17-8-2009 10640900@unknown@formal@none@1@S@Note that some office applications built with Microsoft components may refuse to import OpenOffice data.@@@@1@15@@danf@17-8-2009 10640910@unknown@formal@none@1@S@[[The Sage Group]]'s Simply Accounting, for example, can import Excel's .xls files, but refuses to accept OpenOffice.org-generated .xls files for the reason that the OOo .xls files are not "genuine Microsoft" .xls files.@@@@1@33@@danf@17-8-2009 10640920@unknown@formal@none@1@S@== Development ==@@@@1@3@@danf@17-8-2009 10640930@unknown@formal@none@1@S@=== Overview ===@@@@1@3@@danf@17-8-2009 10640940@unknown@formal@none@1@S@The OpenOffice.org [[Application Programming Interface|API]] is based on a component technology known as [[Universal Network Objects]] (UNO).@@@@1@17@@danf@17-8-2009 10640950@unknown@formal@none@1@S@It consists of a wide range of interfaces defined in a [[CORBA]]-like [[interface description language]].@@@@1@15@@danf@17-8-2009 10640960@unknown@formal@none@1@S@The [[document file format]] used is based on [[XML]] and several export and import filters.@@@@1@15@@danf@17-8-2009 10640970@unknown@formal@none@1@S@All external formats read by OpenOffice.org are converted back and forth from an internal XML representation.@@@@1@16@@danf@17-8-2009 10640980@unknown@formal@none@1@S@By using [[data compression|compression]] when saving [[XML]] to disk, files are generally smaller than the equivalent binary Microsoft Office documents.@@@@1@20@@danf@17-8-2009 10640990@unknown@formal@none@1@S@The native file format for storing documents in version 1.0 was used as the basis of the [[OASIS (organization)|OASIS]] OpenDocument file format standard, which has become the default file format in version 2.0.@@@@1@33@@danf@17-8-2009 10641000@unknown@formal@none@1@S@Development versions of the suite are released every few weeks on the developer zone of the OpenOffice.org website.@@@@1@18@@danf@17-8-2009 10641010@unknown@formal@none@1@S@The releases are meant for those who wish to test new features or are simply curious about forthcoming changes; they are not suitable for production use.@@@@1@26@@danf@17-8-2009 10641020@unknown@formal@none@1@S@=== Native desktop integration ===@@@@1@5@@danf@17-8-2009 10641030@unknown@formal@none@1@S@OpenOffice.org 1.0 was criticized for not having the [[look and feel]] of applications developed natively for the platforms on which it runs.@@@@1@22@@danf@17-8-2009 10641040@unknown@formal@none@1@S@Starting with version 2.0, OpenOffice.org uses native [[widget toolkit]], icons, and font-rendering libraries across a variety of platforms, to better match native applications and provide a smoother experience for the user.@@@@1@31@@danf@17-8-2009 10641050@unknown@formal@none@1@S@There are projects underway to further improve this integration on both [[GNOME]] and [[KDE]].@@@@1@14@@danf@17-8-2009 10641060@unknown@formal@none@1@S@This issue has been particularly pronounced on Mac OS X, whose standard user interface looks noticeably different from either Windows or [[X11]]-based desktop environments and requires the use of programming toolkits unfamiliar to most OpenOffice.org developers.@@@@1@36@@danf@17-8-2009 10641070@unknown@formal@none@1@S@There are two implementations of OpenOffice.org available for OS X:@@@@1@10@@danf@17-8-2009 10641080@unknown@formal@none@1@S@;OpenOffice.org Mac OS X (X11):@@@@1@5@@danf@17-8-2009 10641090@unknown@formal@none@1@S@This official implementation requires the installation of [[X11.app]] or [[XDarwin]], and is a close port of the well-tested Unix version.@@@@1@20@@danf@17-8-2009 10641100@unknown@formal@none@1@S@It is functionally equivalent to the Unix version, and its user interface resembles the [[look and feel]] of that version; for example, the application uses its own [[menu bar]] instead of the OS X menu at the top of the screen.@@@@1@41@@danf@17-8-2009 10641110@unknown@formal@none@1@S@It also requires system fonts to be converted to X11 format for OpenOffice.org to use them (which can be done during application installation).@@@@1@23@@danf@17-8-2009 10641120@unknown@formal@none@1@S@;OpenOffice.org Aqua:@@@@1@2@@danf@17-8-2009 10641130@unknown@formal@none@1@S@After a first step (completed) using [[Carbon (API)|Carbon]], OpenOffice.org Aqua switched to [[Cocoa (API)|Cocoa]] technology, and an [[Aqua (GUI)|Aqua]] version (based on [[Cocoa (API)|Cocoa]]) is also being developed under the aegis of OpenOffice.org, with a Beta version currently available.@@@@1@39@@danf@17-8-2009 10641140@unknown@formal@none@1@S@Sun Microsystems is collaborating with OOo to further development of the Aqua version of OpenOffice.org for Mac.@@@@1@17@@danf@17-8-2009 10641150@unknown@formal@none@1@S@=== Future ===@@@@1@3@@danf@17-8-2009 10641160@unknown@formal@none@1@S@Currently, a developed preview of OpenOffice.org 3 (OOo-dev 3.0) is available for download.@@@@1@13@@danf@17-8-2009 10641170@unknown@formal@none@1@S@Among the planned features for OOo 3.0, set to be released by September 2008 , are:@@@@1@16@@danf@17-8-2009 10641180@unknown@formal@none@1@S@* Personal Information Manager ([[Personal Information Manager|PIM]]), probably based on [[Mozilla Thunderbird|Thunderbird]]/[[Lightning (software)|Lightning]]@@@@1@13@@danf@17-8-2009 10641190@unknown@formal@none@1@S@* PDF import into Draw (to maintain correct layout of the original PDF)@@@@1@13@@danf@17-8-2009 10641200@unknown@formal@none@1@S@* [[OOXML]] document support for opening documents created in [[Office 2007]]@@@@1@11@@danf@17-8-2009 10641210@unknown@formal@none@1@S@* Support for [[Mac OS X]] [[Aqua (user interface)|Aqua]] platform@@@@1@10@@danf@17-8-2009 10641220@unknown@formal@none@1@S@* Extensions, to add third party functionality.@@@@1@7@@danf@17-8-2009 10641230@unknown@formal@none@1@S@* Presenter screen in Impress with multi-screen support@@@@1@8@@danf@17-8-2009 10641240@unknown@formal@none@1@S@=== Other projects ===@@@@1@4@@danf@17-8-2009 10641250@unknown@formal@none@1@S@A number of products are [http://wiki.services.openoffice.org/wiki/DerivedWorks derived from OpenOffice.org].@@@@1@9@@danf@17-8-2009 10641260@unknown@formal@none@1@S@Among the more well-known ones are Sun StarOffice and NeoOffice.@@@@1@10@@danf@17-8-2009 10641270@unknown@formal@none@1@S@The OpenOffice.org site also lists a large variety of [http://wiki.services.openoffice.org/wiki/OpenOffice.org_Solutions complementary products] including groupware solutions.@@@@1@15@@danf@17-8-2009 10641280@unknown@formal@none@1@S@==== NeoOffice ====@@@@1@3@@danf@17-8-2009 10641290@unknown@formal@none@1@S@[[NeoOffice]] is an independent [[porting|port]] that integrates with [[Mac OS X|OS X]]’s [[Aqua (GUI)|Aqua]] user interface using [[Java platform|Java]], [[Carbon (API)|Carbon]] and (increasingly) [[Cocoa (API)|Cocoa]] toolkits.@@@@1@26@@danf@17-8-2009 10641300@unknown@formal@none@1@S@NeoOffice adheres fairly closely to OS X UI standards (for example, using native pull-down menus), and has direct access to OS X’s installed fonts and printers.@@@@1@26@@danf@17-8-2009 10641310@unknown@formal@none@1@S@Its releases lag behind the official OpenOffice.org X11 releases, due to its small development team and the concurrent development of the technology used to port the user interface.@@@@1@28@@danf@17-8-2009 10641320@unknown@formal@none@1@S@Other projects run alongside the main OpenOffice.org project and are easier to contribute to.@@@@1@14@@danf@17-8-2009 10641330@unknown@formal@none@1@S@These include documentation, [[internationalisation and localisation]] and the API.@@@@1@9@@danf@17-8-2009 10641340@unknown@formal@none@1@S@==== OpenGroupware.org ====@@@@1@3@@danf@17-8-2009 10641350@unknown@formal@none@1@S@[[OpenGroupware.org]] is a set of extension programs to allow the sharing of OpenOffice.org documents, calendars, address books, [[e-mail]]s, [[instant messenger|instant messaging]] and blackboards, and provide access to other [[collaborative software|groupware]] applications.@@@@1@31@@danf@17-8-2009 10641360@unknown@formal@none@1@S@There is also an effort to create and share assorted document templates and other useful additions at OOExtras.@@@@1@18@@danf@17-8-2009 10641370@unknown@formal@none@1@S@A set of [[Perl]] extensions is available through the [[CPAN]] in order to allow OpenOffice.org document processing by external programs.@@@@1@20@@danf@17-8-2009 10641380@unknown@formal@none@1@S@These libraries do not use the OpenOffice.org API.@@@@1@8@@danf@17-8-2009 10641390@unknown@formal@none@1@S@They directly read or write the OpenOffice.org files using Perl standard file [[codec|compression/decompression]], XML access and [[UTF-8]] encoding modules.@@@@1@19@@danf@17-8-2009 10641400@unknown@formal@none@1@S@==== Portable ====@@@@1@3@@danf@17-8-2009 10641410@unknown@formal@none@1@S@A distribution of OpenOffice.org called OpenOffice.org Portable is designed to run the suite from a [[USB flash drive]].@@@@1@18@@danf@17-8-2009 10641420@unknown@formal@none@1@S@==== OxygenOffice Professional ====@@@@1@4@@danf@17-8-2009 10641430@unknown@formal@none@1@S@An enhancement of OpenOffice.org, providing: Current Version: 2.4@@@@1@8@@danf@17-8-2009 10641440@unknown@formal@none@1@S@* Possibility to run Visual Basic for Application (VBA) macros in Calc (for testing)@@@@1@14@@danf@17-8-2009 10641450@unknown@formal@none@1@S@* Improved Calc HTML export@@@@1@5@@danf@17-8-2009 10641460@unknown@formal@none@1@S@* Enhanced Access support for Base@@@@1@6@@danf@17-8-2009 10641470@unknown@formal@none@1@S@* Security fixes@@@@1@3@@danf@17-8-2009 10641480@unknown@formal@none@1@S@* Enhanced performance@@@@1@3@@danf@17-8-2009 10641490@unknown@formal@none@1@S@* Enhanced color-palette@@@@1@3@@danf@17-8-2009 10641500@unknown@formal@none@1@S@* Enhanced help menu, additional User’s Manual, and extended tips for beginners@@@@1@12@@danf@17-8-2009 10641510@unknown@formal@none@1@S@Optionally it provides, free for personal and professional use:@@@@1@9@@danf@17-8-2009 10641520@unknown@formal@none@1@S@* More than 3,200 graphics, both clip art and photos.@@@@1@10@@danf@17-8-2009 10641530@unknown@formal@none@1@S@* Several templates and sample documents@@@@1@6@@danf@17-8-2009 10641540@unknown@formal@none@1@S@* Over 90 free fonts.@@@@1@5@@danf@17-8-2009 10641550@unknown@formal@none@1@S@* Additional tools like OOoWikipedia@@@@1@5@@danf@17-8-2009 10641560@unknown@formal@none@1@S@====Extensions====@@@@1@1@@danf@17-8-2009 10641570@unknown@formal@none@1@S@Since version 2.0.4, OpenOffice.org has supported extensions in a similar manner to [[Mozilla Firefox]].@@@@1@14@@danf@17-8-2009 10641580@unknown@formal@none@1@S@Extensions make it easy to add new functionality to an existing OpenOffice.org installation.@@@@1@13@@danf@17-8-2009 10641590@unknown@formal@none@1@S@The [http://extensions.services.openoffice.org/most_pop_ext OpenOffice.org Extension Repository] lists already more than 80 extensions.@@@@1@11@@danf@17-8-2009 10641600@unknown@formal@none@1@S@Developers can easily build new extensions for OpenOffice.org, for example by using the [http://wiki.services.openoffice.org/wiki/OpenOffice_NetBeans_Integration OpenOffice.org API Plugin for NetBeans].@@@@1@19@@danf@17-8-2009 10641610@unknown@formal@none@1@S@==== The OpenOffice.org Bibliographic Project ====@@@@1@6@@danf@17-8-2009 10641620@unknown@formal@none@1@S@This aims to incorporate a powerful [[reference management software]] into the suite.@@@@1@12@@danf@17-8-2009 10641630@unknown@formal@none@1@S@The new major addition is slated for inclusion with the standard OpenOffice.org release on late-2007 to mid-2008, or possibly later depending upon the availability of programmers.@@@@1@26@@danf@17-8-2009 10641640@unknown@formal@none@1@S@=== Security ===@@@@1@3@@danf@17-8-2009 10641650@unknown@formal@none@1@S@OpenOffice.org includes a security team, and as of June 2008 the security organization [[Secunia]] reports no known unpatched security flaws for the software.@@@@1@23@@danf@17-8-2009 10641660@unknown@formal@none@1@S@[[Kaspersky Lab]] has shown a [[proof of concept]] virus for OpenOffice.org.@@@@1@11@@danf@17-8-2009 10641670@unknown@formal@none@1@S@This shows OOo viruses are possible, but there is no known virus "in the wild".@@@@1@15@@danf@17-8-2009 10641680@unknown@formal@none@1@S@In a private meeting of the French Ministry of Defense, macro-related security issues were raised.@@@@1@15@@danf@17-8-2009 10641690@unknown@formal@none@1@S@OpenOffice.org developers have responded and noted that the supposed vulnerability had not been announced through "well defined procedures" for disclosure and that the ministry had revealed nothing specific.@@@@1@28@@danf@17-8-2009 10641700@unknown@formal@none@1@S@However, the developers have been in talks with the researcher concerning the supposed vulnerability.@@@@1@14@@danf@17-8-2009 10641710@unknown@formal@none@1@S@As with Microsoft Word, documents created in OpenOffice can contain [[metadata]] which may include a complete history of what was changed, when and by whom.@@@@1@25@@danf@17-8-2009 10641720@unknown@formal@none@1@S@== Ownership ==@@@@1@3@@danf@17-8-2009 10641730@unknown@formal@none@1@S@The project and software are informally referred to as ''OpenOffice'', but project organizers report that this term is a [[trademark]] held by another party, requiring them to adopt ''OpenOffice.org'' as its formal name.@@@@1@33@@danf@17-8-2009 10641740@unknown@formal@none@1@S@(Due to a similar trademark issue, the [[Brazilian Portuguese]] version of the suite is distributed under the name ''BrOffice.org''.)@@@@1@19@@danf@17-8-2009 10641750@unknown@formal@none@1@S@Development is managed by staff members of StarOffice.@@@@1@8@@danf@17-8-2009 10641760@unknown@formal@none@1@S@Some delay and difficulty in implementing external contributions to the core codebase (even those from the project's corporate sponsors) has been noted.@@@@1@22@@danf@17-8-2009 10641770@unknown@formal@none@1@S@Currently, there are [http://wiki.services.openoffice.org/wiki/DerivedWorks several derived and/or proprietary works based on OOo], with some of them being:@@@@1@17@@danf@17-8-2009 10641780@unknown@formal@none@1@S@* Sun Microsystem's [[StarOffice]], with various complementary add-ons.@@@@1@8@@danf@17-8-2009 10641790@unknown@formal@none@1@S@* IBM's [[Lotus Symphony]], with a new interface based on [[Eclipse (software)|Eclipse]] (based on OO.o 1.x).@@@@1@16@@danf@17-8-2009 10641800@unknown@formal@none@1@S@* OpenOffice.org Novell edition, integrated with [[Novell Evolution|Evolution]] and with a [[OOXML]] filter.@@@@1@13@@danf@17-8-2009 10641810@unknown@formal@none@1@S@* Beijing [[Redflag]] Chinese 2000's [[RedOffice]], fully localized in Chinese characters.@@@@1@11@@danf@17-8-2009 10641820@unknown@formal@none@1@S@* Planamesa's [[NeoOffice]] for [[Mac OS X]] with Aqua support via Java.@@@@1@12@@danf@17-8-2009 10641830@unknown@formal@none@1@S@In [[May 23]], [[2007]], the OpenOffice.org community and Redflag Chinese 2000 Software Co, Ltd. announced a joint development effort focused on integrating the new features that have been added in the RedOffice localization of OpenOffice.org, as well as quality assurance and work on the core applications.@@@@1@46@@danf@17-8-2009 10641840@unknown@formal@none@1@S@Additionally, Redflag Chinese 2000 made public its commitment to the global OO.o community stating it would "strengthen its support of the development of the world's leading free and open source productivity suite", adding around 50 engineers (that have been working on RedOffice since 2006) to the project.@@@@1@47@@danf@17-8-2009 10641850@unknown@formal@none@1@S@In [[September 10]], [[2007]], the OO.o community announced that [[IBM]] had joined to support the development of OpenOffice.org.@@@@1@18@@danf@17-8-2009 10641860@unknown@formal@none@1@S@"IBM will be making initial code contributions that it has been developing as part of its Lotus Notes product, including accessibility enhancements, and will be making ongoing contributions to the feature richness and code quality of OpenOffice.org.@@@@1@37@@danf@17-8-2009 10641870@unknown@formal@none@1@S@Besides working with the community on the free productivity suite's software, IBM will also leverage OpenOffice.org technology in its products" as has been seen with [[Lotus Symphony]].@@@@1@27@@danf@17-8-2009 10641880@unknown@formal@none@1@S@Sean Poulley, the vice president of business and strategy in IBM's [[Lotus Software]] division said that IBM plans to take a leadership role in the OpenOffice.org community together with other companies such as Sun Microsystems.@@@@1@35@@danf@17-8-2009 10641890@unknown@formal@none@1@S@IBM will work within the leadership structure that exists.@@@@1@9@@danf@17-8-2009 10641900@unknown@formal@none@1@S@As of [[October 02]], [[2007]], [[Michael Meeks]] announced (and generated an answer by Sun's [[Simon Phipps]] and Mathias Bauer) a derived OpenOffice.org work, under the wing of his employer [[Novell]], with the purpose of including new features and fixes that do not get easily integrated in the OOo-build up-stream core.@@@@1@50@@danf@17-8-2009 10641910@unknown@formal@none@1@S@The work is called Go-OO (http://go-oo.org/) a name under which alternative OO.o software has been available for five years.@@@@1@19@@danf@17-8-2009 10641920@unknown@formal@none@1@S@The new features are shared with Novell's edition of OOo and include:@@@@1@12@@danf@17-8-2009 10641930@unknown@formal@none@1@S@* [[Visual Basic for Applications|VBA]] macros support.@@@@1@7@@danf@17-8-2009 10641940@unknown@formal@none@1@S@* Faster start up time.@@@@1@5@@danf@17-8-2009 10641950@unknown@formal@none@1@S@* "A [[Linear programming|linear optimization]] solver to optimize a cell value based on arbitrary constraints built into Calc".@@@@1@18@@danf@17-8-2009 10641960@unknown@formal@none@1@S@* Multimedia content supports into documents, using the [[gstreamer]] multimedia framework.@@@@1@11@@danf@17-8-2009 10641970@unknown@formal@none@1@S@* Support for [[Microsoft Works]] formats, [[WordPerfect]] graphics (WPG format) and T602 files imports.@@@@1@14@@danf@17-8-2009 10641980@unknown@formal@none@1@S@[http://wiki.services.openoffice.org/wiki/Contributing_Patches Details about the patch handling including metrics] can be found on the OpenOffice.org site.@@@@1@15@@danf@17-8-2009 10641990@unknown@formal@none@1@S@== Reactions ==@@@@1@3@@danf@17-8-2009 10642000@unknown@formal@none@1@S@Federal Computer Week issue listed OpenOffice.org as one of the "5 stars of open-source products."@@@@1@15@@danf@17-8-2009 10642010@unknown@formal@none@1@S@In contrast, OpenOffice.org was used in [[2005]] by ''[[The Guardian]]'' newspaper to illustrate what it claims are the limitations of open-source software, although the article does finish by stating that the software may be better than MS Word for books.@@@@1@40@@danf@17-8-2009 10642020@unknown@formal@none@1@S@=== Market share ===@@@@1@4@@danf@17-8-2009 10642030@unknown@formal@none@1@S@It is extremely difficult to estimate the market share of OpenOffice.org due to the fact that OpenOffice.org can be freely distributed via download sites including mirrors, peer-to-peer networks, CDs, Linux distros, etc.@@@@1@32@@danf@17-8-2009 10642040@unknown@formal@none@1@S@Nevertheless, the OpenOffice.org tries to capture key adoption data in a market share analysis@@@@1@14@@danf@17-8-2009 10642050@unknown@formal@none@1@S@Although Microsoft Office retains 95% of the general market as measured by revenue, OpenOffice.org and StarOffice have secured 14% of the large enterprise market as of 2004 and 19% of the small to midsize business market in 2005.@@@@1@38@@danf@17-8-2009 10642060@unknown@formal@none@1@S@The OpenOffice.org web site reports more than 98 million downloads.@@@@1@10@@danf@17-8-2009 10642070@unknown@formal@none@1@S@Other large scale users of OpenOffice.org include [[Ministry of Defence (Singapore)|Singapore’s Ministry of Defence]], and [[Bristol]] City Council in the UK.@@@@1@21@@danf@17-8-2009 10642080@unknown@formal@none@1@S@In [[France]], OpenOffice.org has attracted the attention of both local and national government administrations who wish to rationalize their software procurement, as well as have stable, standard file formats for archival purposes.@@@@1@32@@danf@17-8-2009 10642090@unknown@formal@none@1@S@It is now the official office suite for the [[French Gendarmerie]].@@@@1@11@@danf@17-8-2009 10642100@unknown@formal@none@1@S@Several government organizations in India, such as [[IIT Bombay]] (a renowned technical institute), the [[Supreme Court of India]], the [[Allahabad High Court]], which use Linux, completely rely on OpenOffice.org for their administration.@@@@1@32@@danf@17-8-2009 10642110@unknown@formal@none@1@S@On [[October 4]], [[2005]], Sun and [[Google]] announced a strategic partnership.@@@@1@11@@danf@17-8-2009 10642120@unknown@formal@none@1@S@As part of this agreement, Sun will add a Google search bar to OpenOffice.org, Sun and Google will engage in joint marketing activities as well as joint research and development, and Google will help distribute OpenOffice.org.@@@@1@36@@danf@17-8-2009 10642130@unknown@formal@none@1@S@Google is currently distributing StarOffice as part of the [[Google Pack]].@@@@1@11@@danf@17-8-2009 10642140@unknown@formal@none@1@S@Besides StarOffice, there are still a number of OpenOffice.org derived commercial products.@@@@1@12@@danf@17-8-2009 10642150@unknown@formal@none@1@S@Most of them are developed under [[SISSL]] license (which is valid up to OpenOffice.org 2.0 Beta 2).@@@@1@17@@danf@17-8-2009 10642160@unknown@formal@none@1@S@In general they are targeted at local or niche market, with proprietary add-ons such as speech recognition module, automatic database connection, or better [[CJK]] support.@@@@1@25@@danf@17-8-2009 10642170@unknown@formal@none@1@S@In July 2007 Everex, a division of First International Computer and the 9th largest PC supplier in the U.S., began shipping systems preloaded with OpenOffice.org 2.2 into Wal-Mart and Sam's Club throughout North America.@@@@1@34@@danf@17-8-2009 10642180@unknown@formal@none@1@S@In September 2007 IBM announced that it would supply and support OpenOffice.org branded as [[Lotus Symphony]], and integrated into Lotus Notes.@@@@1@21@@danf@17-8-2009 10642190@unknown@formal@none@1@S@IBM also announced 35 developers would be assigned to work on OpenOffice.org, and that it would join the OpenOffice.org foundation.@@@@1@20@@danf@17-8-2009 10642200@unknown@formal@none@1@S@Commentators noted parallels between IBM's 2000 support of Linux and this announcement.@@@@1@12@@danf@17-8-2009 10642210@unknown@formal@none@1@S@=== Java controversy ===@@@@1@4@@danf@17-8-2009 10642220@unknown@formal@none@1@S@In the past OpenOffice.org was criticized for an increasing dependency on the [[Java Runtime Environment]] which was not [[free software]].@@@@1@20@@danf@17-8-2009 10642230@unknown@formal@none@1@S@That Sun Microsystems is both the creator of Java and the chief supporter of OpenOffice.org drew accusations of ulterior motives for this technology choice.@@@@1@24@@danf@17-8-2009 10642240@unknown@formal@none@1@S@Version 1 depended on the [[Java Runtime Environment]] (JRE) being present on the user’s computer for some auxiliary functions, but version 2 increased the suite’s use of Java requiring a JRE.@@@@1@31@@danf@17-8-2009 10642250@unknown@formal@none@1@S@In response, [[Red Hat]] increased their efforts to improve [[free Java implementations]].@@@@1@12@@danf@17-8-2009 10642260@unknown@formal@none@1@S@Red Hat’s [[Fedora (Linux distribution)|Fedora Core]] 4 (released on [[June 13]], [[2005]]) included a beta version of OpenOffice.org version 2, running on [[GNU Compiler for Java|GCJ]] and [[GNU Classpath]].@@@@1@29@@danf@17-8-2009 10642270@unknown@formal@none@1@S@The issue of OpenOffice.org’s use of Java came to the fore in May 2005, when [[Richard Stallman]] appeared to call for a [[fork (software)|fork]] of the application in a posting on the [[Free Software Foundation]] website.@@@@1@36@@danf@17-8-2009 10642280@unknown@formal@none@1@S@This led to discussions within the OpenOffice.org community and between Sun staff and developers involved in [[GNU Classpath]], a free replacement for Sun’s Java implementation.@@@@1@25@@danf@17-8-2009 10642290@unknown@formal@none@1@S@Later that year, the OpenOffice.org developers also placed into their development guidelines various requirements to ensure that future versions of OpenOffice.org could be run on free implementations of Java and fixed the issues which previously prevented OpenOffice.org 2.0 from using free software Java implementations.@@@@1@44@@danf@17-8-2009 10642300@unknown@formal@none@1@S@On [[November 13]], [[2006]], Sun committed to releasing Java under the [[GNU General Public License]] in the near future.@@@@1@19@@danf@17-8-2009 10642310@unknown@formal@none@1@S@This process would end OpenOffice.org's dependence on [[non-free]] software.@@@@1@9@@danf@17-8-2009 10642320@unknown@formal@none@1@S@Between November 2006 and May 2007, Sun Microsystems made available most of their Java technologies under the GNU General Public License, in compliance with the specifications of the Java Community Process, thus making almost all of Sun's Java also free software.@@@@1@41@@danf@17-8-2009 10642330@unknown@formal@none@1@S@The following areas of OpenOffice.org 2.0 depend on the JRE being present:@@@@1@12@@danf@17-8-2009 10642340@unknown@formal@none@1@S@* The [[media player (application software)|media player]] on Unix-like systems@@@@1@10@@danf@17-8-2009 10642350@unknown@formal@none@1@S@* All document wizards in Writer@@@@1@6@@danf@17-8-2009 10642360@unknown@formal@none@1@S@* Accessibility tools@@@@1@3@@danf@17-8-2009 10642370@unknown@formal@none@1@S@* Report Autopilot@@@@1@3@@danf@17-8-2009 10642380@unknown@formal@none@1@S@* [[JDBC]] driver support@@@@1@4@@danf@17-8-2009 10642390@unknown@formal@none@1@S@* [[Hsqldb|HSQL]] database engine, which is used in OpenOffice.org Base@@@@1@10@@danf@17-8-2009 10642400@unknown@formal@none@1@S@* [[XSLT]] filters@@@@1@3@@danf@17-8-2009 10642410@unknown@formal@none@1@S@* [[BeanShell]], the [[NetBeans]] scripting language and the Java UNO bridge@@@@1@11@@danf@17-8-2009 10642420@unknown@formal@none@1@S@* Export filters to the Aportis.doc (.pdb) format for the [[Palm OS]] or [[Pocket Word]] (.psw) format for the [[Pocket PC]]@@@@1@21@@danf@17-8-2009 10642430@unknown@formal@none@1@S@* Export filter to [[LaTeX]]@@@@1@5@@danf@17-8-2009 10642440@unknown@formal@none@1@S@* Export filter to [[MediaWiki]]'s [[wikitext]]@@@@1@6@@danf@17-8-2009 10642450@unknown@formal@none@1@S@A common point of confusion is that [[mail merge]] to generate emails requires the Java API JavaMail in [[StarOffice]]; however, as of version 2.0.1, OpenOffice.org uses a [[Python (programming language)|Python]]-component instead.@@@@1@31@@danf@17-8-2009 10642460@unknown@formal@none@1@S@=== Complementary software ===@@@@1@4@@danf@17-8-2009 10642470@unknown@formal@none@1@S@OpenOffice.org provides replacement for MS Office's [[Microsoft Word]], [[Microsoft Excel]], [[Microsoft PowerPoint]], [[Microsoft Access]], [[Equation Editor|Microsoft Equation Editor]] and [[Microsoft Visio]].@@@@1@21@@danf@17-8-2009 10642480@unknown@formal@none@1@S@But to level the equivalent functionality from the rest of MS Office, OOo can be complemented with other open source programs such as:@@@@1@23@@danf@17-8-2009 10642490@unknown@formal@none@1@S@* [[Novell Evolution|Evolution]] or [[Mozilla Thunderbird|Thunderbird]]/[[Lightning (software)|Lightning]] for a PIM like [[Microsoft Outlook]].@@@@1@13@@danf@17-8-2009 10642500@unknown@formal@none@1@S@* [[OpenProj]] (which seeks integration with OOo, but might be limited due to licensing issues) for [[Microsoft Project]].@@@@1@18@@danf@17-8-2009 10642510@unknown@formal@none@1@S@* [[Scribus]] for [[Microsoft Publisher]]@@@@1@5@@danf@17-8-2009 10642520@unknown@formal@none@1@S@* [[O3spaces]] for [[Sharepoint]]@@@@1@4@@danf@17-8-2009 10642530@unknown@formal@none@1@S@Microsoft also provides Administrative Template Files ("adm files") that allow MS Office to be configured using Windows Group Policy.@@@@1@19@@danf@17-8-2009 10642540@unknown@formal@none@1@S@Equivalent functionality for OpenOffice.org is provided by [http://openoffice-enterprise.com/ OpenOffice-Enterprise], a commercial product from Open Office Technology, Inc.@@@@1@17@@danf@17-8-2009 10642550@unknown@formal@none@1@S@=== Issues ===@@@@1@3@@danf@17-8-2009 10642560@unknown@formal@none@1@S@OpenOffice.org has been criticized for slow start times and extensive CPU and RAM usage in comparison to other competitive software such as Microsoft Office.@@@@1@24@@danf@17-8-2009 10642570@unknown@formal@none@1@S@In comparison, tests between OpenOffice.org 2.2 and Microsoft Office 2007 have found that OpenOffice.org takes approximately 2 times the processing time and memory to load itself along with a blank file; and took approximately 4.7 times the processing time and 3.9 times the memory to open an extremely large spreadsheet file.@@@@1@49@@danf@17-8-2009 10642580@unknown@formal@none@1@S@Critics have pointed to excessive code bloat and OpenOffice.org's loading of the [[Java Virtual Machine|Java Runtime Environment]] as possible reasons for the slow speeds and excessive memory usage.@@@@1@28@@danf@17-8-2009 10642590@unknown@formal@none@1@S@However, since OpenOffice.org 2.2 the performance of OpenOffice.org has been improved dramatically.@@@@1@12@@danf@17-8-2009 10642600@unknown@formal@none@1@S@One of the greatest challenges is its ability to be truly cross compatible with other applications.@@@@1@16@@danf@17-8-2009 10642610@unknown@formal@none@1@S@Since Openoffice.org is forced to reverse engineer proprietary binary formats due to unavailability of open specifications, slight formatting incompatibilities tend to exist when files are saved in non-native format.@@@@1@29@@danf@17-8-2009 10642620@unknown@formal@none@1@S@For example, a complex .doc document formatted under OpenOffice.org, is usually not displayed with the correct format when opened with Microsoft Office.@@@@1@22@@danf@17-8-2009 10642630@unknown@formal@none@1@S@== Retail ==@@@@1@3@@danf@17-8-2009 10642640@unknown@formal@none@1@S@The [[free software license]] under which OpenOffice.org is distributed allows unlimited use of the software for both home and business use, including unlimited redistribution of the software.@@@@1@27@@danf@17-8-2009 10642650@unknown@formal@none@1@S@Several businesses sell the OpenOffice.org suite on auction websites such as [[eBay]], offering value-added services such as 24/7 technical support, download mirrors, and CD mailing.@@@@1@25@@danf@17-8-2009 10642660@unknown@formal@none@1@S@However, often the 24/7 support offered is not provided by the company selling the software, but rather by the official OpenOffice.org mailing list.@@@@1@23@@danf@17-8-2009 10650010@unknown@formal@none@1@S@
Parsing
@@@@1@1@@danf@17-8-2009 10650020@unknown@formal@none@1@S@In [[computer science]] and [[linguistics]], '''parsing''', or, more formally, '''syntactic analysis''', is the process of analyzing a sequence of [[Token (parser)|tokens]] to determine grammatical structure with respect to a given (more or less) [[formal grammar]].@@@@1@35@@danf@17-8-2009 10650030@unknown@formal@none@1@S@A '''parser''' is thus one of the components in an [[interpreter]] or [[compiler]], where it captures the implied hierarchy of the input text and transforms it into a form suitable for further processing (often some kind of [[parse tree]], [[abstract syntax tree]] or other hierarchical structure) and normally checks for syntax errors at the same time.@@@@1@56@@danf@17-8-2009 10650040@unknown@formal@none@1@S@The parser often uses a separate [[lexical analyser]] to create tokens from the sequence of input characters.@@@@1@17@@danf@17-8-2009 10650050@unknown@formal@none@1@S@Parsers may be programmed by hand or may be semi-automatically generated (in some programming language) by a tool (such as [[Yet Another Compiler Compiler|Yacc]]) from a grammar written in [[Backus-Naur form]].@@@@1@31@@danf@17-8-2009 10650060@unknown@formal@none@1@S@Parsing is also an earlier term for the diagramming of sentences of natural languages, and is still used for the diagramming of [[Inflection|inflected]] languages, such as the [[Romance languages|Romance languages]] or [[Latin]].@@@@1@32@@danf@17-8-2009 10650070@unknown@formal@none@1@S@Parsers can also be constructed as executable specifications of grammars in functional programming languages.@@@@1@14@@danf@17-8-2009 10650080@unknown@formal@none@1@S@Frost, Hafiz and Callaghan have built on the work of others to construct a set of [[higher-order function]]s (called [[parser combinators]]) which allow polynomial time and space complexity top-down parser to be constructed as executable specifications of ambiguous grammars containing left-recursive productions.@@@@1@42@@danf@17-8-2009 10650090@unknown@formal@none@1@S@The [http://www.cs.uwindsor.ca/~hafiz/proHome.html X-SAIGA] site has more about the algorithms and implementation details.@@@@1@12@@danf@17-8-2009 10650100@unknown@formal@none@1@S@== Human languages ==@@@@1@4@@danf@17-8-2009 10650110@unknown@formal@none@1@S@:''Also see [[:Category:Natural language parsing]]''@@@@1@5@@danf@17-8-2009 10650120@unknown@formal@none@1@S@In some [[machine translation]] and [[natural language processing]] systems, human languages are parsed by computer programs.@@@@1@16@@danf@17-8-2009 10650130@unknown@formal@none@1@S@Human sentences are not easily parsed by programs, as there is substantial [[syntactic ambiguity|ambiguity]] in the structure of human language.@@@@1@20@@danf@17-8-2009 10650140@unknown@formal@none@1@S@In order to parse natural language data, researchers must first agree on the [[grammar]] to be used.@@@@1@17@@danf@17-8-2009 10650150@unknown@formal@none@1@S@The choice of syntax is affected by both [[linguistic]] and computational concerns; for instance some parsing systems use [[lexical functional grammar]], but in general, parsing for grammars of this type is known to be [[NP-complete]].@@@@1@35@@danf@17-8-2009 10650160@unknown@formal@none@1@S@[[Head-driven phrase structure grammar]] is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn [[Treebank]].@@@@1@35@@danf@17-8-2009 10650170@unknown@formal@none@1@S@[[Shallow parsing]] aims to find only the boundaries of major constituents such as noun phrases.@@@@1@15@@danf@17-8-2009 10650180@unknown@formal@none@1@S@Another popular strategy for avoiding linguistic controversy is [[dependency grammar]] parsing.@@@@1@11@@danf@17-8-2009 10650190@unknown@formal@none@1@S@Most modern parsers are at least partly [[statistics|statistical]]; that is, they rely on a corpus of training data which has already been annotated (parsed by hand).@@@@1@26@@danf@17-8-2009 10650200@unknown@formal@none@1@S@This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts.@@@@1@19@@danf@17-8-2009 10650210@unknown@formal@none@1@S@''(See [[machine learning]].)''@@@@1@3@@danf@17-8-2009 10650220@unknown@formal@none@1@S@Approaches which have been used include straightforward [[PCFG]]s (probabilistic context free grammars), [[maximum entropy]], and [[neural net]]s.@@@@1@17@@danf@17-8-2009 10650230@unknown@formal@none@1@S@Most of the more successful systems use ''lexical'' statistics (that is, they consider the identities of the words involved, as well as their [[part of speech]]).@@@@1@26@@danf@17-8-2009 10650240@unknown@formal@none@1@S@However such systems are vulnerable to [[overfitting]] and require some kind of smoothing to be effective.@@@@1@16@@danf@17-8-2009 10650250@unknown@formal@none@1@S@Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually-designed grammars for programming languages.@@@@1@20@@danf@17-8-2009 10650260@unknown@formal@none@1@S@As mentioned earlier some grammar formalisms are very computationally difficult to parse; in general, even if the desired structure is not [[context-free]], some kind of context-free approximation to the grammar is used to perform a first pass.@@@@1@37@@danf@17-8-2009 10650265@unknown@formal@none@1@S@Algorithms which use context-free grammars often rely on some variant of the [[CKY algorithm]], usually with some [[heuristic (computer science)|heuristic]] to prune away unlikely analyses to save time.@@@@1@28@@danf@17-8-2009 10650270@unknown@formal@none@1@S@''(See [[chart parsing]].)''@@@@1@3@@danf@17-8-2009 10650280@unknown@formal@none@1@S@However some systems trade speed for accuracy using, eg, linear-time versions of the [[Shift-reduce parsing|shift-reduce]] algorithm.@@@@1@16@@danf@17-8-2009 10650290@unknown@formal@none@1@S@A somewhat recent development has been [[parse reranking]] in which the parser proposes some large number of analyses, and a more complex system selects the best option.@@@@1@27@@danf@17-8-2009 10650300@unknown@formal@none@1@S@It is normally branching of one part and its subparts@@@@1@10@@danf@17-8-2009 10650310@unknown@formal@none@1@S@== Programming languages ==@@@@1@4@@danf@17-8-2009 10650320@unknown@formal@none@1@S@The most common use of a parser is as a component of a [[compiler]] or [[interpreter]].@@@@1@16@@danf@17-8-2009 10650330@unknown@formal@none@1@S@This parses the [[source code]] of a [[computer programming language]] to create some form of internal representation.@@@@1@17@@danf@17-8-2009 10650340@unknown@formal@none@1@S@Programming languages tend to be specified in terms of a [[context-free grammar]] because fast and efficient parsers can be written for them.@@@@1@22@@danf@17-8-2009 10650350@unknown@formal@none@1@S@Parsers are written by hand or generated by [[parser generator]]s.@@@@1@10@@danf@17-8-2009 10650360@unknown@formal@none@1@S@Context-free grammars are limited in the extent to which they can express all of the requirements of a language.@@@@1@19@@danf@17-8-2009 10650370@unknown@formal@none@1@S@Informally, the reason is that the memory of such a language is limited.@@@@1@13@@danf@17-8-2009 10650380@unknown@formal@none@1@S@The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced.@@@@1@34@@danf@17-8-2009 10650390@unknown@formal@none@1@S@More powerful grammars that can express this constraint, however, cannot be parsed efficiently.@@@@1@13@@danf@17-8-2009 10650400@unknown@formal@none@1@S@Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out.@@@@1@39@@danf@17-8-2009 10650410@unknown@formal@none@1@S@===Overview of process===@@@@1@3@@danf@17-8-2009 10650420@unknown@formal@none@1@S@[[image:Parser_Flow.gif|right|Flow of data in a typical parser]] The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.@@@@1@27@@danf@17-8-2009 10650430@unknown@formal@none@1@S@The first stage is the token generation, or [[lexical analysis]], by which the input character stream is split into meaningful symbols defined by a grammar of [[regular expression]]s.@@@@1@28@@danf@17-8-2009 10650440@unknown@formal@none@1@S@For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, and 2, each of which is a meaningful symbol in the context of an arithmetic expression.@@@@1@43@@danf@17-8-2009 10650450@unknown@formal@none@1@S@The parser would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.@@@@1@35@@danf@17-8-2009 10650460@unknown@formal@none@1@S@The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression.@@@@1@18@@danf@17-8-2009 10650470@unknown@formal@none@1@S@This is usually done with reference to a [[context-free grammar]] which recursively defines components that can make up an expression and the order in which they must appear.@@@@1@28@@danf@17-8-2009 10650480@unknown@formal@none@1@S@However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers.@@@@1@23@@danf@17-8-2009 10650490@unknown@formal@none@1@S@These rules can be formally expressed with [[attribute grammar]]s.@@@@1@9@@danf@17-8-2009 10650500@unknown@formal@none@1@S@The final phase is [[Semantic analysis (computer science)|semantic parsing]] or analysis, which is working out the implications of the expression just validated and taking the appropriate action.@@@@1@27@@danf@17-8-2009 10650510@unknown@formal@none@1@S@In the case of a calculator or interpreter, the action is to evaluate the expression or program; a compiler, on the other hand, would generate some kind of code.@@@@1@29@@danf@17-8-2009 10650520@unknown@formal@none@1@S@Attribute grammars can also be used to define these actions.@@@@1@10@@danf@17-8-2009 10650530@unknown@formal@none@1@S@==Types of parsers==@@@@1@3@@danf@17-8-2009 10650540@unknown@formal@none@1@S@The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar.@@@@1@24@@danf@17-8-2009 10650550@unknown@formal@none@1@S@This can be done in essentially two ways:@@@@1@8@@danf@17-8-2009 10650560@unknown@formal@none@1@S@*[[Top-down parsing]] - Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for [[parse tree|parse-trees]] using a top-down expansion of the given [[formal grammar]] rules.@@@@1@33@@danf@17-8-2009 10650570@unknown@formal@none@1@S@Tokens are consumed from left to right.@@@@1@7@@danf@17-8-2009 10650580@unknown@formal@none@1@S@Inclusive choice is used to accommodate [[ambiguity]] by expanding all alternative right-hand-sides of grammar rules .@@@@1@16@@danf@17-8-2009 10650590@unknown@formal@none@1@S@[[LL parser]]s and [[recursive-descent parser]] are examples of top-down parsers, which cannot accommodate [[left recursion | left recursive]] productions.@@@@1@19@@danf@17-8-2009 10650600@unknown@formal@none@1@S@Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous [[context-free grammar]]s, more sophisticated algorithm for top-down parsing have been created by Frost, Hafiz, and Callaghan which accommodates [[ambiguity]] and [[left recursion]] in polynomial time and which generates polynomial-size representations of the potentially-exponential number of parse trees.@@@@1@65@@danf@17-8-2009 10650610@unknown@formal@none@1@S@Their algorithm is able to produce both left-most and right-most derivations of an input w.r.t. a given CFG.@@@@1@18@@danf@17-8-2009 10650620@unknown@formal@none@1@S@*[[Bottom-up parsing]] - A parser can start with the input and attempt to rewrite it to the start symbol.@@@@1@19@@danf@17-8-2009 10650630@unknown@formal@none@1@S@Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on.@@@@1@18@@danf@17-8-2009 10650640@unknown@formal@none@1@S@[[LR parser]]s are examples of bottom-up parsers.@@@@1@7@@danf@17-8-2009 10650650@unknown@formal@none@1@S@Another term used for this type of parser is Shift-Reduce parsing.@@@@1@11@@danf@17-8-2009 10650660@unknown@formal@none@1@S@Another important distinction is whether the parser generates a ''leftmost derivation'' or a ''rightmost derivation'' (see [[context-free grammar]]).@@@@1@18@@danf@17-8-2009 10650670@unknown@formal@none@1@S@LL parsers will generate a leftmost [[derivation]] and LR parsers will generate a rightmost derivation (although usually in reverse) .@@@@1@20@@danf@17-8-2009 10650680@unknown@formal@none@1@S@== Examples of parsers ==@@@@1@5@@danf@17-8-2009 10650690@unknown@formal@none@1@S@=== Top-down parsers ===@@@@1@4@@danf@17-8-2009 10650700@unknown@formal@none@1@S@Some of the parsers that use [[top-down parsing]] include:@@@@1@9@@danf@17-8-2009 10650710@unknown@formal@none@1@S@* [[Recursive descent parser]]@@@@1@4@@danf@17-8-2009 10650720@unknown@formal@none@1@S@* [[LL parser]] ('''L'''eft-to-right, '''L'''eftmost derivation)@@@@1@6@@danf@17-8-2009 10650730@unknown@formal@none@1@S@* [http://www.cs.uwindsor.ca/~hafiz/proHome.html X-SAIGA] - eXecutable SpecificAtIons of GrAmmars.@@@@1@8@@danf@17-8-2009 10650740@unknown@formal@none@1@S@Contains publications related to top-down parsing algorithm that supports left-recursion and ambiguity in polynomial time and space.@@@@1@17@@danf@17-8-2009 10650750@unknown@formal@none@1@S@=== Bottom-up parsers ===@@@@1@4@@danf@17-8-2009 10650760@unknown@formal@none@1@S@Some of the parsers that use [[bottom-up parsing]] include:@@@@1@9@@danf@17-8-2009 10650770@unknown@formal@none@1@S@* Precedence parser@@@@1@3@@danf@17-8-2009 10650780@unknown@formal@none@1@S@** [[Operator-precedence parser]]@@@@1@3@@danf@17-8-2009 10650790@unknown@formal@none@1@S@** [[Simple precedence parser]]@@@@1@4@@danf@17-8-2009 10650800@unknown@formal@none@1@S@* BC (bounded context) parsing@@@@1@5@@danf@17-8-2009 10650810@unknown@formal@none@1@S@* [[LR parser]] ('''L'''eft-to-right, '''R'''ightmost derivation)@@@@1@6@@danf@17-8-2009 10650820@unknown@formal@none@1@S@** [[SLR parser|Simple LR (SLR) parser]]@@@@1@6@@danf@17-8-2009 10650830@unknown@formal@none@1@S@** [[LALR parser]]@@@@1@3@@danf@17-8-2009 10650840@unknown@formal@none@1@S@** [[Canonical LR parser|Canonical LR (LR(1)) parser]]@@@@1@7@@danf@17-8-2009 10650850@unknown@formal@none@1@S@** [[GLR parser]]@@@@1@3@@danf@17-8-2009 10650860@unknown@formal@none@1@S@* [[CYK algorithm|CYK parser]]@@@@1@4@@danf@17-8-2009 10660010@unknown@formal@none@1@S@
Lexical category
@@@@1@2@@danf@17-8-2009 10660020@unknown@formal@none@1@S@In [[grammar]], a '''lexical category''' (also '''word class''', '''lexical class''', or in traditional grammar '''part of speech''') is a linguistic category of words (or more precisely ''lexical items''), which is generally defined by the [[syntactic]] or [[morphology (linguistics)|morphological]] behaviour of the lexical item in question.@@@@1@45@@danf@17-8-2009 10660030@unknown@formal@none@1@S@Common linguistic categories include ''noun'' and ''verb'', among others.@@@@1@9@@danf@17-8-2009 10660040@unknown@formal@none@1@S@There are [[open class word|open word classes]], which constantly acquire new members, and [[closed class word|closed word classes]], which acquire new members infrequently if at all.@@@@1@26@@danf@17-8-2009 10660050@unknown@formal@none@1@S@Different languages may have different lexical categories, or they might associate different properties to the same one.@@@@1@17@@danf@17-8-2009 10660060@unknown@formal@none@1@S@For example, [[Japanese language|Japanese]] has at least three classes of adjectives where English has one; Chinese and Japanese have [[measure word]]s while European languages have nothing resembling them; many languages don't have a distinction between adjectives and adverbs, or adjectives and nouns, etc.@@@@1@43@@danf@17-8-2009 10660070@unknown@formal@none@1@S@Many linguists argue that the formal distinctions between parts of speech must be made within the framework of a specific language or language family, and should not be carried over to other languages or language families.@@@@1@36@@danf@17-8-2009 10660080@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10660090@unknown@formal@none@1@S@The classification of words into lexical categories is found from the earliest moments in the [[history of linguistics]].@@@@1@18@@danf@17-8-2009 10660100@unknown@formal@none@1@S@In the ''[[Nirukta]]'', written in the [[5th century BCE|5th]] or [[6th century BCE]], the [[Sanskrit grammarian]] [[Yāska]] defined four main categories of words :@@@@1@24@@danf@17-8-2009 10660110@unknown@formal@none@1@S@# nāma - [[noun]]s or substantives@@@@1@6@@danf@17-8-2009 10660120@unknown@formal@none@1@S@# ākhyāta - [[verb]]s@@@@1@4@@danf@17-8-2009 10660130@unknown@formal@none@1@S@# upasarga - pre-verbs or [[prefix]]es@@@@1@6@@danf@17-8-2009 10660140@unknown@formal@none@1@S@# nipāta - [[Grammatical particle|particle]]s, invariant words (perhaps [[prepositions]])@@@@1@9@@danf@17-8-2009 10660150@unknown@formal@none@1@S@These four were grouped into two large classes: [[inflection|inflected]] (nouns and verbs) and uninflected (pre-verbs and particles).@@@@1@17@@danf@17-8-2009 10660160@unknown@formal@none@1@S@A century or two later, the [[Classical Greece|Greek]] scholar [[Plato]] wrote in the [[Cratylus (dialogue)|''Cratylus'' dialog]] that "... sentences are, I conceive, a combination of verbs [''rhēma''] and nouns [''ónoma'']".@@@@1@30@@danf@17-8-2009 10660170@unknown@formal@none@1@S@Another class, "conjunctions" (covering [[Grammatical conjunction|conjunction]]s, [[pronoun]]s, and the [[article (grammar)|article]]), was later added by [[Aristotle]].@@@@1@16@@danf@17-8-2009 10660180@unknown@formal@none@1@S@By the end of the [[2nd century BCE]], the classification scheme had been expanded into eight categories, seen in the ''[[Art of Grammar|Tékhnē grammatiké]]'':@@@@1@24@@danf@17-8-2009 10660190@unknown@formal@none@1@S@# Noun: a part of speech inflected for case, signifying a concrete or abstract entity@@@@1@15@@danf@17-8-2009 10660200@unknown@formal@none@1@S@# Verb: a part of speech without case inflection, but inflected for tense, person and number, signifying an activity or process performed or undergone@@@@1@24@@danf@17-8-2009 10660210@unknown@formal@none@1@S@# Participle: a part of speech sharing the features of the verb and the noun@@@@1@15@@danf@17-8-2009 10660220@unknown@formal@none@1@S@# Article: a part of speech inflected for case and preposed or postposed to nouns (the relative pronoun is meant by the postposed article)@@@@1@24@@danf@17-8-2009 10660230@unknown@formal@none@1@S@# Pronoun: a part of speech substitutable for a noun and marked for person@@@@1@14@@danf@17-8-2009 10660240@unknown@formal@none@1@S@# Preposition: a part of speech placed before other words in composition and in syntax@@@@1@15@@danf@17-8-2009 10660250@unknown@formal@none@1@S@# Adverb: a part of speech without inflection, in modification of or in addition to a verb@@@@1@17@@danf@17-8-2009 10660260@unknown@formal@none@1@S@# Conjunction: a part of speech binding together the discourse and filling gaps in its interpretation@@@@1@16@@danf@17-8-2009 10660270@unknown@formal@none@1@S@The [[Latin grammar]]ian [[Priscian]] ([[floruit|fl.]] [[500 CE]]) modified the above eight-fold system, substituting "[[interjection]]" for "article".@@@@1@16@@danf@17-8-2009 10660280@unknown@formal@none@1@S@It wasn't until 1767 that the [[adjective]] was taken as a separate class.@@@@1@13@@danf@17-8-2009 10660290@unknown@formal@none@1@S@Traditional English grammar is patterned after the European tradition above, and is still taught in schools and used in [[dictionaries]].@@@@1@20@@danf@17-8-2009 10660300@unknown@formal@none@1@S@It names eight parts of speech: [[noun]], [[verb]], [[adjective]], [[adverb]], [[pronoun]], [[preposition]], [[Grammatical conjunction|conjunction]], and [[interjection]] (sometimes called an exclamation).@@@@1@20@@danf@17-8-2009 10660310@unknown@formal@none@1@S@==Controversies==@@@@1@1@@danf@17-8-2009 10660320@unknown@formal@none@1@S@Since the Greek grammarians of 2nd century BCE, parts of speech have been defined by [[morphology (linguistics)|morphological]], [[syntax|syntactic]] and [[semantics|semantic]] criteria.@@@@1@21@@danf@17-8-2009 10660330@unknown@formal@none@1@S@However, there is currently no generally agreed-upon classification scheme that can apply to all languages, or even a set of criteria upon which such a scheme should be based.@@@@1@29@@danf@17-8-2009 10660340@unknown@formal@none@1@S@Linguists recognize that the above list of eight word classes is simplified and artificial.@@@@1@14@@danf@17-8-2009 10660350@unknown@formal@none@1@S@For example, "adverb" is to some extent a catch-all class that includes words with many different functions.@@@@1@17@@danf@17-8-2009 10660360@unknown@formal@none@1@S@Some have even argued that the most basic of category distinctions, that of nouns and verbs, is unfounded, or not applicable to certain languages.@@@@1@24@@danf@17-8-2009 10660370@unknown@formal@none@1@S@==Functional classification==@@@@1@2@@danf@17-8-2009 10660380@unknown@formal@none@1@S@Common ways of delimiting words by function include:@@@@1@8@@danf@17-8-2009 10660390@unknown@formal@none@1@S@* '''[[Open word classes]]:'''@@@@1@4@@danf@17-8-2009 10660400@unknown@formal@none@1@S@**[[adjective]]s@@@@1@1@@danf@17-8-2009 10660410@unknown@formal@none@1@S@**[[adverb]]s@@@@1@1@@danf@17-8-2009 10660420@unknown@formal@none@1@S@**[[interjection]]s@@@@1@1@@danf@17-8-2009 10660430@unknown@formal@none@1@S@**[[noun]]s@@@@1@1@@danf@17-8-2009 10660440@unknown@formal@none@1@S@**[[verb]]s (except [[auxiliary verb]]s)@@@@1@4@@danf@17-8-2009 10660450@unknown@formal@none@1@S@* '''[[Closed word classes]]:'''@@@@1@4@@danf@17-8-2009 10660460@unknown@formal@none@1@S@**[[auxiliary verb]]s@@@@1@2@@danf@17-8-2009 10660470@unknown@formal@none@1@S@**[[clitic]]s@@@@1@1@@danf@17-8-2009 10660480@unknown@formal@none@1@S@**[[coverb]]s@@@@1@1@@danf@17-8-2009 10660490@unknown@formal@none@1@S@**[[Grammatical conjunction|conjunction]]s@@@@1@2@@danf@17-8-2009 10660500@unknown@formal@none@1@S@**[[determiner (class)|Determiner]]s ([[article (grammar)|article]]s, [[quantifier]]s, [[demonstrative adjective]]s, and [[possessive adjective]]s)@@@@1@10@@danf@17-8-2009 10660510@unknown@formal@none@1@S@**[[grammatical particle|particle]]s@@@@1@2@@danf@17-8-2009 10660520@unknown@formal@none@1@S@**[[measure word]]s@@@@1@2@@danf@17-8-2009 10660530@unknown@formal@none@1@S@**[[adposition]]s (prepositions, postpositions, and circumpositions)@@@@1@5@@danf@17-8-2009 10660540@unknown@formal@none@1@S@**[[preverb]]s@@@@1@1@@danf@17-8-2009 10660550@unknown@formal@none@1@S@**[[pronoun]]s@@@@1@1@@danf@17-8-2009 10660560@unknown@formal@none@1@S@**[[Contraction (grammar)|contraction]]s@@@@1@2@@danf@17-8-2009 10660570@unknown@formal@none@1@S@**[[Names of numbers in English#Cardinal numbers|cardinal numbers]]@@@@1@7@@danf@17-8-2009 10660580@unknown@formal@none@1@S@==English==@@@@1@1@@danf@17-8-2009 10660590@unknown@formal@none@1@S@[[English language|English]] frequently does not [[marker (linguistics)|mark]] words as belonging to one part of speech or another.@@@@1@17@@danf@17-8-2009 10660600@unknown@formal@none@1@S@Words like ''neigh'', ''break'', ''outlaw'', ''laser'', ''microwave'' and ''telephone'' might all be either verb forms or nouns.@@@@1@17@@danf@17-8-2009 10660610@unknown@formal@none@1@S@Although ''-ly'' is an adverb marker, not all adverbs end in ''-ly'' and not all words ending in ''-ly'' are adverbs.@@@@1@21@@danf@17-8-2009 10660620@unknown@formal@none@1@S@For instance, ''tomorrow'', ''slow'', ''fast'', ''crosswise'' can all be adverbs, while ''early'', ''friendly'', ''ugly'' are all adjectives (though ''early'' can also function as an adverb).@@@@1@25@@danf@17-8-2009 10660630@unknown@formal@none@1@S@In certain circumstances, even words with primarily grammatical functions can be used as verbs or nouns, as in "We must look to the ''hows'' and not just the ''whys''" or "Miranda was ''to-ing and fro-ing'' and not paying attention".@@@@1@39@@danf@17-8-2009 10670010@unknown@formal@none@1@S@
Part-of-speech tagging
@@@@1@2@@danf@17-8-2009 10670020@unknown@formal@none@1@S@'''Part-of-speech tagging''' ('''POS tagging''' or '''POST'''), also called '''grammatical tagging''', is the process of marking up the words in a text as corresponding to a particular [[parts of speech|part of speech]], based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a [[phrase]], [[sentence]], or [[paragraph]].@@@@1@53@@danf@17-8-2009 10670030@unknown@formal@none@1@S@A simplified form of this is commonly taught school-age children, in the identification of words as [[noun]]s, [[verb]]s, [[adjective]]s, [[adverb]]s, etc.@@@@1@21@@danf@17-8-2009 10670040@unknown@formal@none@1@S@Once performed by hand, POS tagging is now done in the context of [[computational linguistics]], using [[algorithms]] which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags.@@@@1@36@@danf@17-8-2009 10670050@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10670060@unknown@formal@none@1@S@Research on part-of-speech tagging has been closely tied to [[corpus linguistics]].@@@@1@11@@danf@17-8-2009 10670070@unknown@formal@none@1@S@The first major corpus of English for computer analysis was the [[Brown Corpus]] developed at [[Brown University]] by [[Henry Kucera]] and [[Nelson Francis]], in the mid-1960s.@@@@1@26@@danf@17-8-2009 10670080@unknown@formal@none@1@S@It consists of about 1,000,000 words of running English prose text, made up of 500 samples from randomly chosen publications.@@@@1@20@@danf@17-8-2009 10670090@unknown@formal@none@1@S@Each sample is 2,000 or more words (ending at the first sentence-end after 2,000 words, so that the corpus contains only complete sentences).@@@@1@23@@danf@17-8-2009 10670100@unknown@formal@none@1@S@The [[Brown Corpus]] was painstakingly "tagged" with part-of-speech markers over many years.@@@@1@12@@danf@17-8-2009 10670110@unknown@formal@none@1@S@A first approximation was done with a program by Greene and Rubin, which consisted of a huge handmade list of what categories could co-occur at all.@@@@1@26@@danf@17-8-2009 10670120@unknown@formal@none@1@S@For example, article then noun can occur, but article verb (arguably) cannot.@@@@1@12@@danf@17-8-2009 10670130@unknown@formal@none@1@S@The program got about 70% correct.@@@@1@6@@danf@17-8-2009 10670140@unknown@formal@none@1@S@Its results were repeatedly reviewed and corrected by hand, and later users sent in errata, so that by the late 70s the tagging was nearly perfect (allowing for some cases even human speakers might not agree on).@@@@1@37@@danf@17-8-2009 10670150@unknown@formal@none@1@S@This corpus has been used for innumerable studies of word-frequency and of part-of-speech, and inspired the development of similar "tagged" corpora in many other languages.@@@@1@25@@danf@17-8-2009 10670160@unknown@formal@none@1@S@Statistics derived by analyzing it formed the basis for most later part-of-speech tagging systems, such as CLAWS and [[VOLSUNGA]].@@@@1@19@@danf@17-8-2009 10670170@unknown@formal@none@1@S@However, by this time (2005) it has been superseded by larger corpora such as the 100 million word [[British National Corpus]].@@@@1@21@@danf@17-8-2009 10670180@unknown@formal@none@1@S@For some time, part-of-speech tagging was considered an inseparable part of [[natural language processing]], because there are certain cases where the correct part of speech cannot be decided without understanding the [[semantics]] or even the [[pragmatics]] of the context.@@@@1@39@@danf@17-8-2009 10670190@unknown@formal@none@1@S@This is extremely expensive, especially because analyzing the higher levels is much harder when multiple part-of-speech possibilities must be considered for each word.@@@@1@23@@danf@17-8-2009 10670200@unknown@formal@none@1@S@In the mid 1980s, researchers in Europe began to use [[hidden Markov model]]s (HMMs) to disambiguate parts of speech, when working to tag the [[Lancaster-Oslo-Bergen Corpus]] of British English.@@@@1@29@@danf@17-8-2009 10670210@unknown@formal@none@1@S@HMMs involve counting cases (such as from the Brown Corpus), and making a table of the probabilities of certain sequences.@@@@1@20@@danf@17-8-2009 10670220@unknown@formal@none@1@S@For example, once you've seen an article such as 'the', perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.@@@@1@28@@danf@17-8-2009 10670230@unknown@formal@none@1@S@Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than a verb or a modal.@@@@1@25@@danf@17-8-2009 10670240@unknown@formal@none@1@S@The same method can of course be used to benefit from knowledge about following words.@@@@1@15@@danf@17-8-2009 10670250@unknown@formal@none@1@S@More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples or even larger sequences.@@@@1@18@@danf@17-8-2009 10670260@unknown@formal@none@1@S@So, for example, if you've just seen an article and a verb, the next item may be very likely a preposition, article, or noun, but even less likely another verb.@@@@1@30@@danf@17-8-2009 10670270@unknown@formal@none@1@S@When several ambiguous words occur together, the possibilities multiply.@@@@1@9@@danf@17-8-2009 10670280@unknown@formal@none@1@S@However, it is easy to enumerate every combination and to assign a relative probability to each one, by multiplying together the probabilities of each choice in turn.@@@@1@27@@danf@17-8-2009 10670290@unknown@formal@none@1@S@The combination with highest probability is then chosen.@@@@1@8@@danf@17-8-2009 10670300@unknown@formal@none@1@S@The European group developed CLAWS, a tagging program that did exactly this, and achieved accuracy in the 93-95% range.@@@@1@19@@danf@17-8-2009 10670310@unknown@formal@none@1@S@It is worth remembering, as [[Eugene Charniak]] points out in ''Statistical techniques for natural language parsing'' [http://www.cs.brown.edu/people/ec/home.html], that merely assigning the most common tag to each known word and the tag "proper noun" to all unknowns, will approach 90% accuracy because many words are unambiguous.@@@@1@45@@danf@17-8-2009 10670320@unknown@formal@none@1@S@CLAWS pioneered the field of HMM-based part of speech tagging, but was quite expensive since it enumerated all possibilities.@@@@1@19@@danf@17-8-2009 10670330@unknown@formal@none@1@S@It sometimes had to resort to backup methods when there were simply too many (the [[Brown Corpus]] contains a case with 17 ambiguous words in a row, and there are words such as "still" that can represent as many as 7 distinct parts of speech).@@@@1@45@@danf@17-8-2009 10670340@unknown@formal@none@1@S@In 1987, [[Steve DeRose]] and [[Ken Church]] independently developed [[dynamic programming]] algorithms to solve the same problem in vastly less time.@@@@1@21@@danf@17-8-2009 10670350@unknown@formal@none@1@S@Their methods were similar to the [[Viterbi algorithm]] known for some time in other fields.@@@@1@15@@danf@17-8-2009 10670360@unknown@formal@none@1@S@DeRose used a table of pairs, while Church used a table of triples and an ingenious method of estimating the values for triples that were rare or nonexistent in the Brown Corpus (actual measurement of triple probabilities would require a much larger corpus).@@@@1@43@@danf@17-8-2009 10670370@unknown@formal@none@1@S@Both methods achieved accuracy over 95%.@@@@1@6@@danf@17-8-2009 10670380@unknown@formal@none@1@S@DeRose's 1990 dissertation at [[Brown University]] included analyses of the specific error types, probabilities, and other related data, and replicated his work for Greek, where it proved similarly effective.@@@@1@29@@danf@17-8-2009 10670390@unknown@formal@none@1@S@These findings were surprisingly disruptive to the field of [[Natural Language Processing]].@@@@1@12@@danf@17-8-2009 10670400@unknown@formal@none@1@S@The accuracy reported was higher than the typical accuracy of very sophisticated algorithms that integrated part of speech choice with many higher levels of linguistic analysis: syntax, morphology, semantics, and so on.@@@@1@32@@danf@17-8-2009 10670410@unknown@formal@none@1@S@CLAWS, DeRose's and Church's methods did fail for some of the known cases where semantics is required, but those proved negligibly rare.@@@@1@22@@danf@17-8-2009 10670420@unknown@formal@none@1@S@This convinced many in the field that part-of-speech tagging could usefully be separated out from the other levels of processing; this in turn simplified the theory and practice of computerized language analysis, and encouraged researchers to find ways to separate out other pieces as well.@@@@1@45@@danf@17-8-2009 10670430@unknown@formal@none@1@S@Markov Models are now the standard method for part-of-speech assignment.@@@@1@10@@danf@17-8-2009 10670440@unknown@formal@none@1@S@The methods already discussed involve working from a pre-existing corpus to learn tag probabilities.@@@@1@14@@danf@17-8-2009 10670450@unknown@formal@none@1@S@It is, however, also possible to [[Bootstrapping (linguistics)|bootstrap]] using "unsupervised" tagging.@@@@1@11@@danf@17-8-2009 10670460@unknown@formal@none@1@S@Unsupervised tagging techniques use an untagged corpus for their training data and produce the tagset by induction.@@@@1@17@@danf@17-8-2009 10670470@unknown@formal@none@1@S@That is, they observe patterns in word use, and derive part-of-speech categories themselves.@@@@1@13@@danf@17-8-2009 10670480@unknown@formal@none@1@S@For example, statistics readily reveal that "the", "a", and "an" occur in similar contexts, while "eat" occurs in very different ones.@@@@1@21@@danf@17-8-2009 10670490@unknown@formal@none@1@S@With sufficient iteration, similarity classes of words emerge that are remarkably similar to those human linguists would expect; and the differences themselves sometimes suggest valuable new insights.@@@@1@27@@danf@17-8-2009 10670500@unknown@formal@none@1@S@These two categories can be further subdivided into rule-based, stochastic, and neural approaches.@@@@1@13@@danf@17-8-2009 10670510@unknown@formal@none@1@S@Some current major algorithms for '''part-of-speech tagging''' include the [[Viterbi algorithm]], [[Brill Tagger]], and the [[Baum-Welch algorithm]] (also known as the forward-backward algorithm).@@@@1@23@@danf@17-8-2009 10670520@unknown@formal@none@1@S@[[Hidden Markov model]] and [[visible Markov model]] taggers can both be implemented using the [[Viterbi algorithm]].@@@@1@16@@danf@17-8-2009 10680010@unknown@formal@none@1@S@
Pattern recognition
@@@@1@2@@danf@17-8-2009 10680020@unknown@formal@none@1@S@'''Pattern recognition''' is a sub-topic of [[machine learning]].@@@@1@8@@danf@17-8-2009 10680030@unknown@formal@none@1@S@It can be defined as@@@@1@5@@danf@17-8-2009 10680040@unknown@formal@none@1@S@:"the act of taking in raw data and taking an action based on the [[Category (taxonomy)|category]] of the data".@@@@1@19@@danf@17-8-2009 10680050@unknown@formal@none@1@S@Most research in pattern recognition is about methods for [[supervised learning]] and [[unsupervised learning]].@@@@1@14@@danf@17-8-2009 10680060@unknown@formal@none@1@S@Pattern recognition aims to classify [[data]] ([[pattern]]s) based on either ''[[A priori and a posteriori (philosophy)|a priori]]'' knowledge or on [[statistics|statistical]] information extracted from the patterns.@@@@1@26@@danf@17-8-2009 10680070@unknown@formal@none@1@S@The patterns to be classified are usually groups of measurements or observations, defining points in an appropriate [[space (mathematics)|multidimensional space]].@@@@1@20@@danf@17-8-2009 10680080@unknown@formal@none@1@S@This is in contrast to '''[[pattern matching]]''', where the pattern is rigidly specified.@@@@1@13@@danf@17-8-2009 10680090@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10680100@unknown@formal@none@1@S@A complete pattern recognition system consists of a [[sensor]] that gathers the observations to be classified or described; a [[feature extraction]] mechanism that computes numeric or symbolic information from the observations; and a [[statistical classification|classification]] or description scheme that does the actual job of classifying or describing observations, relying on the extracted features.@@@@1@53@@danf@17-8-2009 10680110@unknown@formal@none@1@S@The classification or description scheme is usually based on the availability of a set of patterns that have already been classified or described.@@@@1@23@@danf@17-8-2009 10680120@unknown@formal@none@1@S@This set of patterns is termed the [[training set]] and the resulting learning strategy is characterized as [[supervised learning]].@@@@1@19@@danf@17-8-2009 10680130@unknown@formal@none@1@S@Learning can also be [[unsupervised learning|unsupervised]], in the sense that the system is not given an ''a priori'' labeling of patterns, instead it establishes the classes itself based on the statistical regularities of the patterns.@@@@1@35@@danf@17-8-2009 10680140@unknown@formal@none@1@S@The classification or description scheme usually uses one of the following approaches: [[statistical classification|statistical]] (or decision theoretic), [[syntactic pattern recognition|syntactic]] (or structural).@@@@1@22@@danf@17-8-2009 10680150@unknown@formal@none@1@S@Statistical pattern recognition is based on statistical characterisations of patterns, assuming that the patterns are generated by a [[probabilistic]] system.@@@@1@20@@danf@17-8-2009 10680160@unknown@formal@none@1@S@Syntactical (or structural) pattern recognition is based on the structural interrelationships of features.@@@@1@13@@danf@17-8-2009 10680170@unknown@formal@none@1@S@A wide range of algorithms can be applied for pattern recognition, from very simple [[Naive Bayes classifier|Bayesian classifiers]] to much more powerful [[Artificial neural network|neural networks]].@@@@1@26@@danf@17-8-2009 10680180@unknown@formal@none@1@S@An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers).@@@@1@32@@danf@17-8-2009 10680190@unknown@formal@none@1@S@Pattern recognition is more complex when templates are used to generate variants.@@@@1@12@@danf@17-8-2009 10680200@unknown@formal@none@1@S@For example, in English, sentences often follow the "N-VP" (noun - verb phrase) pattern, but some knowledge of the English language is required to detect the pattern.@@@@1@27@@danf@17-8-2009 10680210@unknown@formal@none@1@S@Pattern recognition is studied in many fields, including [[psychology]], [[ethology]], and [[computer science]].@@@@1@13@@danf@17-8-2009 10680220@unknown@formal@none@1@S@[[Holographic associative memory]] is another type of pattern matching scheme where a target small patterns can be searched from a large set of learned patterns based on cognitive meta-weight.@@@@1@29@@danf@17-8-2009 10680230@unknown@formal@none@1@S@==Uses==@@@@1@1@@danf@17-8-2009 10680240@unknown@formal@none@1@S@Within medical science pattern recognition creates the basis for [[computer-aided diagnosis]] (CAD) systems.@@@@1@13@@danf@17-8-2009 10680250@unknown@formal@none@1@S@CAD describes a procedure that supports the doctor's interpretations and findings.@@@@1@11@@danf@17-8-2009 10680260@unknown@formal@none@1@S@Typical applications are automatic [[speech recognition]], [[document classification|classification of text into several categories]] (e.g. spam/non-spam email messages), the [[handwriting recognition|automatic recognition of handwritten postal codes]] on postal envelopes, or the [[facial recognition system|automatic recognition of images]] of human faces.@@@@1@39@@danf@17-8-2009 10680270@unknown@formal@none@1@S@The last two examples form the subtopic [[image analysis]] of pattern recognition that deals with digital images as input to pattern recognition systems.@@@@1@23@@danf@17-8-2009 10690010@unknown@formal@none@1@S@
Phrase
@@@@1@1@@danf@17-8-2009 10690020@unknown@formal@none@1@S@In [[grammar]], a '''phrase''' is a group of [[word]]s that functions as a single unit in the [[syntax]] of a [[Sentence (linguistics)|sentence]].@@@@1@22@@danf@17-8-2009 10690030@unknown@formal@none@1@S@For example ''the house at the end of the street'' (example 1) is a phrase.@@@@1@15@@danf@17-8-2009 10690040@unknown@formal@none@1@S@It acts like a noun.@@@@1@5@@danf@17-8-2009 10690050@unknown@formal@none@1@S@It contains the phrase ''at the end of the street'' (example 2), a prepositional phrase which acts like an adjective.@@@@1@20@@danf@17-8-2009 10690060@unknown@formal@none@1@S@Example 2 could be replaced by ''white'', to make the phrase ''the white house''.@@@@1@14@@danf@17-8-2009 10690070@unknown@formal@none@1@S@Examples 1 and 2 contain the phrase ''the end of the street'' (example 3) which acts like a noun.@@@@1@19@@danf@17-8-2009 10690080@unknown@formal@none@1@S@It could be replaced by ''the cross-roads'' to give ''the house at the cross-roads''.@@@@1@14@@danf@17-8-2009 10690090@unknown@formal@none@1@S@Most phrases have a or central word which defines the type of phrase.@@@@1@13@@danf@17-8-2009 10690100@unknown@formal@none@1@S@This word is called the [[head (linguistics)|head]] of the phrase.@@@@1@10@@danf@17-8-2009 10690110@unknown@formal@none@1@S@In English the head is often the first word of the phrase.@@@@1@12@@danf@17-8-2009 10690120@unknown@formal@none@1@S@Some phrases, however, can be headless.@@@@1@6@@danf@17-8-2009 10690130@unknown@formal@none@1@S@For example, ''the rich'' is a noun phrase composed of a determiner and an adjective, but no noun.@@@@1@18@@danf@17-8-2009 10690140@unknown@formal@none@1@S@Phrases may be classified by the type of head they take@@@@1@11@@danf@17-8-2009 10690150@unknown@formal@none@1@S@*[[Prepositional phrase]] (PP) with a [[preposition]] as head (e.g. ''in love'', ''over the rainbow'').@@@@1@14@@danf@17-8-2009 10690160@unknown@formal@none@1@S@Languages that use [[postposition]]s instead have [[postpositional phrase]]s.@@@@1@8@@danf@17-8-2009 10690170@unknown@formal@none@1@S@The two types are sometimes commonly referred to as [[adpositional phrase]]s.@@@@1@11@@danf@17-8-2009 10690180@unknown@formal@none@1@S@*[[Noun phrase]] (NP) with a [[noun]] as head (e.g. ''the black cat'', ''a cat on the mat'')@@@@1@17@@danf@17-8-2009 10690190@unknown@formal@none@1@S@*[[Verb phrase]] (VP) with a [[verb]] as head (e.g. ''eat cheese'', ''jump up and down'')@@@@1@15@@danf@17-8-2009 10690200@unknown@formal@none@1@S@*[[Adjectival phrase]] with an [[adjective]] as head (e.g. ''full of toys'')@@@@1@11@@danf@17-8-2009 10690210@unknown@formal@none@1@S@*[[Adverbial phrase]] with [[adverb]] as head (e.g. ''very carefully'')@@@@1@9@@danf@17-8-2009 10690220@unknown@formal@none@1@S@== Formal definition ==@@@@1@4@@danf@17-8-2009 10690230@unknown@formal@none@1@S@A '''phrase''' is a [[syntax|syntactic]] structure which has syntactic properties derived from its [[head (linguistics)|head]].@@@@1@15@@danf@17-8-2009 10690240@unknown@formal@none@1@S@== Complexity ==@@@@1@3@@danf@17-8-2009 10690250@unknown@formal@none@1@S@A complex phrase consists of several words, whereas a simple phrase consists of only one word.@@@@1@16@@danf@17-8-2009 10690260@unknown@formal@none@1@S@This terminology is especially often used with [[verb]] phrases:@@@@1@9@@danf@17-8-2009 10690270@unknown@formal@none@1@S@* simple past and present are simple verb, which require just one verb@@@@1@13@@danf@17-8-2009 10690280@unknown@formal@none@1@S@* complex verb have one or two [[grammatical aspect|aspect]]s added, hence require additional two or three words@@@@1@17@@danf@17-8-2009 10690290@unknown@formal@none@1@S@"Complex", which is phrase-level, is often confused with "[[compound (linguistics)|compound]]", which is [[word]]-level.@@@@1@13@@danf@17-8-2009 10690300@unknown@formal@none@1@S@However, there are certain phenomena that formally seem to be phrases but semantically are more like compounds, like "women's magazines", which has the form of a possessive noun phrase, but which refers (just like a compound) to one specific [[lexeme]] (i.e. a magazine for women and not some magazine owned by a woman).@@@@1@53@@danf@17-8-2009 10690310@unknown@formal@none@1@S@== Semiotic approaches to the concept of "phrase" ==@@@@1@9@@danf@17-8-2009 10690320@unknown@formal@none@1@S@In more [[semiotic]] approaches to language, such as the more cognitivist versions of [[construction grammar]], a phrasal structure is not only a certain formal combination of word types whose features are inherited from the head.@@@@1@35@@danf@17-8-2009 10690330@unknown@formal@none@1@S@Here each phrasal structure also expresses some type of [[concept]]ual content, be it specific or abstract.@@@@1@16@@danf@17-8-2009 10700010@unknown@formal@none@1@S@
Portuguese language
@@@@1@2@@danf@17-8-2009 10700020@unknown@formal@none@1@S@'''Portuguese''' ( or ''língua portuguesa'') is a [[Romance language]] that originated in what is now [[Galicia (Spain)]] and [[Portugal|northern Portugal]] from the [[Latin language|Latin]] spoken by [[Romanization (cultural)|romanized]] [[Pre-Roman peoples of the Iberian Peninsula]] (namely the [[Gallaeci]], the [[Lusitanians]], the [[Celtici]] and the [[Conii]]) about 2000 years ago.@@@@1@48@@danf@17-8-2009 10700030@unknown@formal@none@1@S@It spread worldwide in the 15th and 16th centuries as Portugal established a [[Portuguese Empire|colonial and commercial empire]] (1415–1999) which spanned from [[Brazil]] in the [[Americas]] to [[Goa]] in [[India]] and [[Macau]] in [[China]], in fact it was used exclusively on the island of [[Sri Lanka]] as the [[lingua franca]] for almost 350 years.@@@@1@54@@danf@17-8-2009 10700040@unknown@formal@none@1@S@During that time, many [[Portuguese Creole|creole languages based on Portuguese]] also appeared around the world, especially in [[Africa]], [[Asia]], and the [[Caribbean]].@@@@1@22@@danf@17-8-2009 10700050@unknown@formal@none@1@S@Today it is one of the world's major languages, [[List of languages by number of native speakers|ranked 6th]] according to number of native speakers (approximately 177 million).@@@@1@27@@danf@17-8-2009 10700060@unknown@formal@none@1@S@It is the language with the largest number of speakers in [[South America]], spoken by nearly all of Brazil's population, which amounts to over 51% of the continent's population even though it is the only Portuguese-speaking nation in [[the Americas]].@@@@1@40@@danf@17-8-2009 10700070@unknown@formal@none@1@S@It is also a major lingua franca in Portugal's former colonial possessions in Africa.@@@@1@14@@danf@17-8-2009 10700080@unknown@formal@none@1@S@It is the official language of ten countries (see the table on the right), also being co-official with [[Spanish language|Spanish]] and [[French language|French]] in [[Equatorial Guinea]], with [[Standard Cantonese|Cantonese]] [[Chinese language|Chinese]] in the Chinese special administrative region of [[Macau]], and with [[Tetum]] in [[East Timor]].@@@@1@45@@danf@17-8-2009 10700090@unknown@formal@none@1@S@There are sizable communities of Portuguese-speakers in various regions of North America, notably in the [[United States]] ([[New Jersey]], [[New England]] and south [[Florida]]) and in [[Ontario]], [[Canada]].@@@@1@28@@danf@17-8-2009 10700100@unknown@formal@none@1@S@[[Spain|Spanish]] author [[Miguel de Cervantes]] once called Portuguese "the sweet language", while Brazilian writer [[Olavo Bilac]] poetically described it as ''a última flor do Lácio, inculta e bela'': "the last flower of [[Latium]], wild and beautiful".@@@@1@36@@danf@17-8-2009 10700110@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10700120@unknown@formal@none@1@S@Today, Portuguese is the [[official language]] of [[Angola]], [[Brazil]], [[Cape Verde]], [[Guinea-Bissau]], [[Portugal]], [[São Tomé and Príncipe]] and [[Mozambique]].@@@@1@19@@danf@17-8-2009 10700130@unknown@formal@none@1@S@It is also one of the official languages of [[Equatorial Guinea]] (with [[Spanish language|Spanish]] and [[French language|French]]), the [[Special Administrative Region of the People's Republic of China|Chinese special administrative region]] of [[Macau]] (with [[Chinese language|Chinese]]), and [[East Timor]], (with [[Tetum]]).@@@@1@40@@danf@17-8-2009 10700140@unknown@formal@none@1@S@It is a [[First language|native language]] of most of the population in Portugal (100%), Brazil (99%), Angola (60%), and São Tomé and Príncipe (50%), and it is spoken by a [[plurality]] of the population of Mozambique (40%), though only 6.5% are native speakers.@@@@1@43@@danf@17-8-2009 10700150@unknown@formal@none@1@S@No data is available for Cape Verde, but almost all the population is bilingual, and the monolingual population speaks [[Cape Verdean Creole]].@@@@1@22@@danf@17-8-2009 10700160@unknown@formal@none@1@S@Small Portuguese-speaking communities subsist in former overseas colonies of Portugal such as Macau, where it is spoken as a first language by 0.6% of the population and East Timor.@@@@1@29@@danf@17-8-2009 10700170@unknown@formal@none@1@S@[[Uruguay]] gave Portuguese an equal status to Spanish in its educational system at the north border with Brazil.@@@@1@18@@danf@17-8-2009 10700180@unknown@formal@none@1@S@In the rest of the country, it's taught as an obligatory subject beginning by the 6th grade.@@@@1@17@@danf@17-8-2009 10700190@unknown@formal@none@1@S@It is also spoken by substantial immigrant communities, though not official, in [[Andorra]], [[France]], [[Luxembourg]], [[Jersey]] (with a statistically significant Portuguese-speaking community of approximately 10,000 people), [[Paraguay]], [[Namibia]], [[South Africa]], [[Switzerland]], [[Venezuela]] and in the [[U.S.]] states of [[California]], [[Connecticut]], [[Florida]], [[Massachusetts]], [[New Jersey]], [[New York]] and [[Rhode Island]].@@@@1@49@@danf@17-8-2009 10700200@unknown@formal@none@1@S@In some parts of India, such as [[Goa]] and [[Daman and Diu]] Portuguese is still spoken.@@@@1@16@@danf@17-8-2009 10700210@unknown@formal@none@1@S@There are also significant populations of Portuguese speakers in [[Canada]] (mainly concentrated in and around [[Toronto]]) [[Bermuda]] and [[Netherlands Antilles]].@@@@1@20@@danf@17-8-2009 10700220@unknown@formal@none@1@S@Portuguese is an official language of several international organizations.@@@@1@9@@danf@17-8-2009 10700230@unknown@formal@none@1@S@The [[Community of Portuguese Language Countries]] (with the Portuguese acronym CPLP) consists of the eight independent countries that have Portuguese as an official language.@@@@1@24@@danf@17-8-2009 10700240@unknown@formal@none@1@S@It is also an official language of the [[European Union]], [[Mercosul]], the [[Organization of American States]], the [[Organization of Ibero-American States]], the [[Union of South American Nations]], and the [[African Union]] (one of the working languages) and one of the official languages of other organizations.@@@@1@45@@danf@17-8-2009 10700250@unknown@formal@none@1@S@The Portuguese language is gaining popularity in Africa, Asia, and South America as a second language for study.@@@@1@18@@danf@17-8-2009 10700260@unknown@formal@none@1@S@Portuguese and Spanish are the fastest-growing European languages, and, according to estimates by UNESCO, Portuguese is the language with the highest potential for growth as an international language in southern Africa and South America.@@@@1@34@@danf@17-8-2009 10700270@unknown@formal@none@1@S@The Portuguese-speaking African countries are expected to have a combined population of 83 million by 2050.@@@@1@16@@danf@17-8-2009 10700280@unknown@formal@none@1@S@Since 1991, when Brazil signed into the economic market of Mercosul with other South American nations, such as Argentina, Uruguay, and Paraguay, there has been an increase in interest in the study of Portuguese in those South American countries.@@@@1@39@@danf@17-8-2009 10700290@unknown@formal@none@1@S@The demographic weight of Brazil in the continent will continue to strengthen the presence of the language in the region.@@@@1@20@@danf@17-8-2009 10700300@unknown@formal@none@1@S@Although in the early 21st century, after Macau was ceded to China in 1999, the use of Portuguese was in decline in Asia, it is becoming a language of opportunity there; mostly because of East Timor's boost in the number of speakers in the last five years but also because of increased Chinese diplomatic and financial ties with Portuguese-speaking countries.@@@@1@60@@danf@17-8-2009 10700310@unknown@formal@none@1@S@In July 2007, President Teodoro Obiang Nguema announced his government's decision to make Portuguese [[Equatorial Guinea]]'s third official language, in order to meet the requirements to apply for full membership of the [[Community of Portuguese Language Countries]].@@@@1@37@@danf@17-8-2009 10700320@unknown@formal@none@1@S@This upgrading from its current Associate Observer condition would result in Equatorial Guinea being able to access several professional and academic exchange programs and the facilitation of cross-border circulation of citizens.@@@@1@31@@danf@17-8-2009 10700330@unknown@formal@none@1@S@Its application is currently being assessed by other CPLP members.@@@@1@10@@danf@17-8-2009 10700340@unknown@formal@none@1@S@In March 1994 the [[Bosque de Portugal]] (Portugal's Woods) was founded in the Brazilian city of [[Curitiba]].@@@@1@17@@danf@17-8-2009 10700350@unknown@formal@none@1@S@The park houses the Portuguese Language Memorial, which honors the Portuguese immigrants and the countries that adopted the Portuguese language.@@@@1@20@@danf@17-8-2009 10700360@unknown@formal@none@1@S@Originally there were seven nations represented with pillars, but the independence of [[East Timor]] brought yet another pillar for that nation in 2007.@@@@1@23@@danf@17-8-2009 10700370@unknown@formal@none@1@S@In March 2006, the [[Museum of the Portuguese Language]], an interactive museum about the Portuguese language, was founded in [[São Paulo]], Brazil, the city with the largest number of Portuguese speakers in the world.@@@@1@34@@danf@17-8-2009 10700380@unknown@formal@none@1@S@==Dialects==@@@@1@1@@danf@17-8-2009 10700390@unknown@formal@none@1@S@Portuguese is a [[pluricentric language]] with two main groups of [[dialect]]s, those of [[Brazil]] and those of the [[Old World]].@@@@1@20@@danf@17-8-2009 10700400@unknown@formal@none@1@S@For historical reasons, the dialects of Africa and Asia are generally closer to those of Portugal than the Brazilian dialects, although in some aspects of their phonetics, especially the pronunciation of unstressed vowels, they resemble [[Brazilian Portuguese]] more than [[European Portuguese]].@@@@1@41@@danf@17-8-2009 10700410@unknown@formal@none@1@S@They have not been studied as widely as European and Brazilian Portuguese.@@@@1@12@@danf@17-8-2009 10700420@unknown@formal@none@1@S@Audio samples of some dialects of Portuguese are available below.@@@@1@10@@danf@17-8-2009 10700430@unknown@formal@none@1@S@There are some differences between the areas but these are the best approximations possible.@@@@1@14@@danf@17-8-2009 10700440@unknown@formal@none@1@S@For example, the ''caipira'' dialect has some differences from the one of Minas Gerais, but in general it is very close.@@@@1@21@@danf@17-8-2009 10700450@unknown@formal@none@1@S@A good example of Brazilian Portuguese may be found in the capital city, [[Brasília]], because of the generalized population from all parts of the country.@@@@1@25@@danf@17-8-2009 10700460@unknown@formal@none@1@S@'''[[Angola]]'''@@@@1@1@@danf@17-8-2009 10700470@unknown@formal@none@1@S@# ''Benguelense'' — [[Benguela]] province.@@@@1@5@@danf@17-8-2009 10700480@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som85.html ''Luandense''] — [[Luanda]] province.@@@@1@6@@danf@17-8-2009 10700490@unknown@formal@none@1@S@# ''Sulista'' — South of Angola.@@@@1@6@@danf@17-8-2009 10700500@unknown@formal@none@1@S@'''[[Brazil]]'''@@@@1@1@@danf@17-8-2009 10700510@unknown@formal@none@1@S@# ''[[Caipira]]'' — States of [[São Paulo (state)|São Paulo]] (countryside; the city of São Paulo and the eastern areas of the state have their own dialect, called ''paulistano''); southern [[Minas Gerais]], northern [[Paraná (state)|Paraná]], [[Goiás]] and [[Mato Grosso do Sul]].@@@@1@40@@danf@17-8-2009 10700520@unknown@formal@none@1@S@# ''Cearense'' — [[Ceará]].@@@@1@4@@danf@17-8-2009 10700530@unknown@formal@none@1@S@# ''Baiano'' — [[Bahia]].@@@@1@4@@danf@17-8-2009 10700540@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som90.html ''Fluminense''] — Variants spoken in the states of [[Rio de Janeiro (state)|Rio de Janeiro]] and [[Espírito Santo]] (excluding the city of Rio de Janeiro and its adjacent metropolitan areas, which have their own dialect, called ''[[carioca]]'').@@@@1@38@@danf@17-8-2009 10700550@unknown@formal@none@1@S@# ''[[Gaucho|Gaúcho]]'' — [[Rio Grande do Sul]].@@@@1@7@@danf@17-8-2009 10700560@unknown@formal@none@1@S@(There are many distinct accents in Rio Grande do Sul, mainly due to the heavy influx of European immigrants of diverse origins, those which have settled several colonies throughout the state.)@@@@1@31@@danf@17-8-2009 10700570@unknown@formal@none@1@S@# ''[[Mineiro]]'' — [[Minas Gerais]] (not prevalent in the [[Triângulo Mineiro]], southern and southeastern [[Minas Gerais]]).@@@@1@16@@danf@17-8-2009 10700580@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som91.html ''Nordestino''] — [[Northeast Region, Brazil|northeastern states of Brazil]] ([[Pernambuco]] and [[Rio Grande do Norte]] have a particular way of speaking).@@@@1@22@@danf@17-8-2009 10700590@unknown@formal@none@1@S@# ''Nortista'' — [[Amazon Basin]] states.@@@@1@6@@danf@17-8-2009 10700600@unknown@formal@none@1@S@# ''Paulistano'' — Variants spoken around [[São Paulo]] city and the eastern areas of São Paulo state.@@@@1@17@@danf@17-8-2009 10700610@unknown@formal@none@1@S@# ''Sertanejo'' — States of [[Goiás]] and [[Mato Grosso]] (the city of [[Cuiabá]] has a particular way of speaking).@@@@1@19@@danf@17-8-2009 10700620@unknown@formal@none@1@S@# ''Sulista'' — Variants spoken in the areas between the northern regions of [[Rio Grande do Sul]] and southern regions of São Paulo state.@@@@1@24@@danf@17-8-2009 10700630@unknown@formal@none@1@S@(The cities of [[Curitiba]], [[Florianópolis]], and [[Itapetininga]] have fairly distinct accents as well.)@@@@1@13@@danf@17-8-2009 10700640@unknown@formal@none@1@S@'''[[Portugal]]'''@@@@1@1@@danf@17-8-2009 10700650@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som69.html ''Açoriano''] (Azorean) — [[Azores]].@@@@1@6@@danf@17-8-2009 10700660@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som40.html ''Alentejano''] — [[Alentejo]]@@@@1@5@@danf@17-8-2009 10700670@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som44.html ''Algarvio''] — [[Algarve]] (there is a particular dialect in a small part of western Algarve).@@@@1@17@@danf@17-8-2009 10700680@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som1.html ''Alto-Minhoto''] — North of [[Braga]] (hinterland).@@@@1@8@@danf@17-8-2009 10700690@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som49.html ''Baixo-Beirão''; ''Alto-Alentejano''] — Central Portugal (hinterland).@@@@1@8@@danf@17-8-2009 10700700@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som9.html ''Beirão''] — Central Portugal.@@@@1@6@@danf@17-8-2009 10700710@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som22.html ''Estremenho''] — Regions of [[Coimbra]] and [[Lisbon]] (the Lisbon dialect has some peculiar features not shared with the one of Coimbra).@@@@1@23@@danf@17-8-2009 10700720@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som60.html ''Madeirense''] (Madeiran) — [[Madeira]].@@@@1@6@@danf@17-8-2009 10700730@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som14.html ''Nortenho''] — Regions of Braga and [[Porto]].@@@@1@9@@danf@17-8-2009 10700740@unknown@formal@none@1@S@# [http://www.instituto-camoes.pt/cvc/hlp/geografia/som6.html ''Transmontano''] — [[Trás-os-Montes e Alto Douro]].@@@@1@8@@danf@17-8-2009 10700750@unknown@formal@none@1@S@Other countries@@@@1@2@@danf@17-8-2009 10700760@unknown@formal@none@1@S@* '''[[Cape Verde]]''' — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som87.html ''Português cabo-verdiano''] ([[Cape Verdean Portuguese]])@@@@1@10@@danf@17-8-2009 10700770@unknown@formal@none@1@S@* '''[[Daman and Diu]]''', India — ''Damaense''.@@@@1@7@@danf@17-8-2009 10700780@unknown@formal@none@1@S@* '''[[East Timor]]''' — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som84.html ''Timorense''] ([[East Timorese Portuguese|East Timorese]])@@@@1@10@@danf@17-8-2009 10700790@unknown@formal@none@1@S@* '''[[Goa]]''', India — ''Goês''.@@@@1@5@@danf@17-8-2009 10700800@unknown@formal@none@1@S@* '''[[Guinea-Bissau]]''' — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som88.html ''Guineense''] ([[Guinean Portuguese]]).@@@@1@7@@danf@17-8-2009 10700810@unknown@formal@none@1@S@* '''[[Macau]]''', China — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som92.html ''Macaense''] ([[Macanese Portuguese|Macanese]])@@@@1@8@@danf@17-8-2009 10700820@unknown@formal@none@1@S@* '''[[Mozambique]]''' — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som89.html ''Moçambicano''] ([[Mozambican Portuguese|Mozambican]])@@@@1@7@@danf@17-8-2009 10700830@unknown@formal@none@1@S@* '''[[São Tomé and Príncipe]]''' — [http://www.instituto-camoes.pt/cvc/hlp/geografia/som83.html ''Santomense'']@@@@1@8@@danf@17-8-2009 10700840@unknown@formal@none@1@S@* '''[[Uruguay]]''' — [[Riverense Portuñol language|''Dialectos Portugueses del Uruguay (DPU)'']].@@@@1@10@@danf@17-8-2009 10700850@unknown@formal@none@1@S@Differences between dialects are mostly of [[accent (linguistics)|accent]] and [[vocabulary]], but between the Brazilian dialects and other dialects, especially in their most coloquial forms, there can also be some grammatical differences.@@@@1@31@@danf@17-8-2009 10700860@unknown@formal@none@1@S@The [[Portuguese creole|Portuguese-based creole]]s spoken in various parts of Africa, Asia, and the Americas are independent languages which should not be confused with Portuguese itself.@@@@1@25@@danf@17-8-2009 10700870@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10700880@unknown@formal@none@1@S@Arriving in the Iberian Peninsula in 216 BC, the Romans brought with them the [[Latin language]], from which all Romance languages descend.@@@@1@22@@danf@17-8-2009 10700890@unknown@formal@none@1@S@The language was spread by arriving Roman soldiers, settlers and merchants, who built Roman cities mostly near the settlements of previous civilizations.@@@@1@22@@danf@17-8-2009 10700900@unknown@formal@none@1@S@Between AD 409 and 711, as the Roman Empire collapsed in Western Europe, the Iberian Peninsula was conquered by Germanic peoples ([[Migration Period]]).@@@@1@23@@danf@17-8-2009 10700910@unknown@formal@none@1@S@The occupiers, mainly [[Suebi]] and [[Visigoths]], quickly adopted late Roman culture and the [[Vulgar Latin]] dialects of the peninsula.@@@@1@19@@danf@17-8-2009 10700920@unknown@formal@none@1@S@After the [[Moors|Moorish]] invasion of 711, [[Arabic language|Arabic]] became the administrative language in the conquered regions, but most of the population continued to speak a form of [[Romance languages|Romance]] commonly known as [[Mozarabic]].@@@@1@33@@danf@17-8-2009 10700930@unknown@formal@none@1@S@The influence exerted by Arabic on the Romance dialects spoken in the Christian kingdoms of the north was small, affecting mainly their lexicon.@@@@1@23@@danf@17-8-2009 10700940@unknown@formal@none@1@S@The earliest surviving records of a distinctively Portuguese language are administrative documents of the 9th century, still interspersed with many Latin phrases.@@@@1@22@@danf@17-8-2009 10700950@unknown@formal@none@1@S@Today this phase is known as Proto-Portuguese (between the 9th and the 12th centuries).@@@@1@14@@danf@17-8-2009 10700960@unknown@formal@none@1@S@In the first period of Old Portuguese — [[Galician-Portuguese]] Period (from the 12th to the 14th century) — the language gradually came into general use.@@@@1@25@@danf@17-8-2009 10700970@unknown@formal@none@1@S@For some time, it was the language of preference for [[lyric poetry]] in Christian Hispania, much like [[Occitan]] was the language of the [[Occitan literature#Poetry_of_the_troubadours|poetry of the troubadours]].@@@@1@28@@danf@17-8-2009 10700980@unknown@formal@none@1@S@Portugal was formally recognized as an independent kingdom by the [[Kingdom of Leon]] in 1143, with [[Afonso I of Portugal|Afonso Henriques]] as king.@@@@1@23@@danf@17-8-2009 10700990@unknown@formal@none@1@S@In 1290, king [[Denis of Portugal|Dinis]] created the first Portuguese university in Lisbon (the ''Estudos Gerais'', later moved to [[Coimbra]]) and decreed that Portuguese, then simply called the "common language" should be known as the Portuguese language and used officially.@@@@1@40@@danf@17-8-2009 10701000@unknown@formal@none@1@S@In the second period of Old Portuguese, from the 14th to the 16th century, with the [[Age of discovery|Portuguese discoveries]], the language was taken to many regions of [[Asia]], [[Africa]] and the [[Americas]] (nowadays, the great majority of Portuguese speakers live in Brazil, in South America).@@@@1@46@@danf@17-8-2009 10701010@unknown@formal@none@1@S@By the 16th century it had become a ''[[lingua franca]]'' in Asia and Africa, used not only for colonial administration and trade but also for communication between local officials and Europeans of all nationalities.@@@@1@34@@danf@17-8-2009 10701020@unknown@formal@none@1@S@Its spread was helped by mixed marriages between Portuguese and local people, and by its association with [[Roman Catholic]] [[missionary]] efforts, which led to the formation of a [[creole language]] called [[Kristang language|Kristang]] in many parts of Asia (from the word ''cristão'', "Christian").@@@@1@43@@danf@17-8-2009 10701030@unknown@formal@none@1@S@The language continued to be popular in parts of Asia until the 19th century.@@@@1@14@@danf@17-8-2009 10701040@unknown@formal@none@1@S@Some Portuguese-speaking Christian communities in [[India]], [[Sri Lanka]], [[Malaysia]], and [[Indonesia]] preserved their language even after they were isolated from Portugal.@@@@1@21@@danf@17-8-2009 10701050@unknown@formal@none@1@S@The end of the Old Portuguese period was marked by the publication of the ''Cancioneiro Geral'' by [[Garcia de Resende]], in 1516.@@@@1@22@@danf@17-8-2009 10701060@unknown@formal@none@1@S@The early times of Modern Portuguese, which spans from the 16th century to present day, were characterized by an increase in the number of learned words borrowed from Classical Latin and Classical Greek since the Renaissance, which greatly enriched the lexicon.@@@@1@41@@danf@17-8-2009 10701070@unknown@formal@none@1@S@===Characterization===@@@@1@1@@danf@17-8-2009 10701080@unknown@formal@none@1@S@A distinctive feature of Portuguese is that it preserved the stressed vowels of [[Vulgar Latin]], which became diphthongs in other Romance languages; cf. Fr. ''pierre'', Sp. ''piedra'', It. ''pietra'', Port. ''pedra'', from Lat. ''petra''; or Sp. ''fuego'', It. ''fuoco'', Port. ''fogo'', from Lat. ''focum''.@@@@1@44@@danf@17-8-2009 10701090@unknown@formal@none@1@S@Another characteristic of early Portuguese was the loss of [[:wiktionary:intervocalic|intervocalic]] ''l'' and ''n'', sometimes followed by the merger of the two surrounding vowels, or by the insertion of an [[epenthesis|epenthetic vowel]] between them: cf. Lat. ''salire'', ''tenere'', ''catena'', Sp. ''salir'', ''tener'', ''cadena'', Port. ''sair'', ''ter'', ''cadeia''.@@@@1@46@@danf@17-8-2009 10701100@unknown@formal@none@1@S@When the [[elision|elided]] consonant was ''n'', it often [[nasalization|nasalized]] the preceding vowel: cf. Lat. ''manum'', ''rana'', ''bonum'', Port. ''mão'', ''rãa'', ''bõo'' (now ''mão'', ''rã'', ''bom'').@@@@1@25@@danf@17-8-2009 10701110@unknown@formal@none@1@S@This process was the source of most of the nasal diphthongs which are typical of Portuguese.@@@@1@16@@danf@17-8-2009 10701120@unknown@formal@none@1@S@In particular, the Latin endings ''-anem'', ''-anum'' and ''-onem'' became ''-ão'' in most cases, cf. Lat. ''canem'', ''germanum'', ''rationem'' with Modern Port. ''cão'', ''irmão'', ''razão'', and their plurals ''-anes'', ''-anos'', ''-ones'' normally became ''-ães'', ''-ãos'', ''-ões'', cf. ''cães'', ''irmãos'', ''razões''.@@@@1@40@@danf@17-8-2009 10701130@unknown@formal@none@1@S@===Movement to make Portuguese an official language of the UN===@@@@1@10@@danf@17-8-2009 10701140@unknown@formal@none@1@S@There is a growing number of people in the Portuguese speaking media and the internet who are presenting the case to the CPLP and other organizations to run a debate in the [[Lusophone]] community with the purpose of bringing forward a petition to make Portuguese an official language of the United Nations.@@@@1@52@@danf@17-8-2009 10701150@unknown@formal@none@1@S@In October 2005, during the international Convention of the [http://www.elosinternacional.com.br/index.htm Elos Club International ] that took place in Tavira, Portugal a petition was written and unanimously approved whose text can be found on the internet with the title ''Petição Para Tornar Oficial o Idioma Português na ONU''.@@@@1@47@@danf@17-8-2009 10701160@unknown@formal@none@1@S@Romulo Alexandre Soares, president of the Brazil-Portugal Chamber highlights that the positioning of Brazil in the international arena as one of the emergent powers of the 21 century, the size of its population, and the presence of the language around the world provides legitimacy and justifies a petition to the UN to make the Portuguese an official language at the UN.@@@@1@61@@danf@17-8-2009 10701170@unknown@formal@none@1@S@==Vocabulary==@@@@1@1@@danf@17-8-2009 10701180@unknown@formal@none@1@S@Most of the lexicon of Portuguese is derived from Latin.@@@@1@10@@danf@17-8-2009 10701190@unknown@formal@none@1@S@Nevertheless, because of the [[Moors|Moorish]] occupation of the [[Iberian Peninsula]] during the Middle Ages, and the participation of Portugal in the [[Age of Discovery]], it has adopted loanwords from all over the world.@@@@1@33@@danf@17-8-2009 10701200@unknown@formal@none@1@S@Very few Portuguese words can be traced to the [[Pre-Roman peoples of the Iberian Peninsula|pre-Roman inhabitants of Portugal]], which included the [[Gallaeci]], [[Lusitanians]], [[Celtici]] and [[Cynetes]].@@@@1@26@@danf@17-8-2009 10701210@unknown@formal@none@1@S@The [[Phoenicians]] and [[Carthaginians]], briefly present, also left some scarce traces.@@@@1@11@@danf@17-8-2009 10701220@unknown@formal@none@1@S@Some notable examples are ''abóbora'' "pumpkin" and ''bezerro'' "year-old calf", from the nearby [[Celtiberian language]] (probably through the Celtici); ''cerveja'' "beer", from [[Celtic languages|Celtic]]; ''saco'' "bag", from [[Phoenician language|Phoenician]]; and ''cachorro'' "dog, puppy", from [[Basque language|Basque]].@@@@1@36@@danf@17-8-2009 10701230@unknown@formal@none@1@S@In the 5th century, the Iberian Peninsula (the [[Ancient Rome|Roman]] [[Hispania]]) was conquered by the [[Germanic peoples|Germanic]] [[Suevi]] and [[Visigoths]].@@@@1@20@@danf@17-8-2009 10701240@unknown@formal@none@1@S@As they adopted the Roman civilization and language, however, these people contributed only a few words to the lexicon, mostly related to warfare — such as ''espora'' "spur", ''estaca'' "stake", and ''guerra'' "war", from [[Gothic language|Gothic]] ''*spaúra'', ''*stakka'', and ''*wirro'', respectively.@@@@1@41@@danf@17-8-2009 10701250@unknown@formal@none@1@S@Between the 9th and 15th centuries Portuguese acquired about 1000 words from [[Arabic language|Arabic]] by influence of [[al-Andalus|Moorish Iberia]].@@@@1@19@@danf@17-8-2009 10701260@unknown@formal@none@1@S@They are often recognizable by the initial Arabic article ''a''(''l'')''-'', and include many common words such as ''aldeia'' "village" from الضيعة ''aldaya'', ''alface'' "lettuce" from الخس ''alkhass'', ''armazém'' "warehouse" from المخزن ''almahazan'', and ''azeite'' "olive oil" from زيت ''azzait''.@@@@1@39@@danf@17-8-2009 10701270@unknown@formal@none@1@S@From Arabic came also the grammatically peculiar word [[Insha'Allah|''oxalá'']] "hopefully".@@@@1@10@@danf@17-8-2009 10701280@unknown@formal@none@1@S@The Mozambican currency name [[Mozambican Metical|''metical'']] was derived from the word مطقال ''miṭqāl'', a unit of weight.@@@@1@17@@danf@17-8-2009 10701290@unknown@formal@none@1@S@The word Mozambique itself is from the Arabic name of sultan Muça Alebique (Musa Alibiki).@@@@1@15@@danf@17-8-2009 10701300@unknown@formal@none@1@S@The name of the Portuguese town of [[Fátima, Portugal|Fátima]] comes from the name of one of the daughters of the prophet [[Muhammad]].@@@@1@22@@danf@17-8-2009 10701310@unknown@formal@none@1@S@Starting in the 15th century, the Portuguese maritime explorations led to the introduction of many loanwords from [[Asia]]n languages.@@@@1@19@@danf@17-8-2009 10701320@unknown@formal@none@1@S@For instance, ''catana'' "cutlass" from Japanese ''katana''; ''corja'' "rabble" from Malay ''kórchchu''; and ''chá'' "tea" from [[Chinese language|Chinese]] ''[[Tea#The word tea|''chá'']]''.@@@@1@21@@danf@17-8-2009 10701330@unknown@formal@none@1@S@From South America came ''batata'' "[[potato]]", from [[Taino]]; ''ananás'' and ''abacaxi'', from [[Tupi-Guarani]] ''naná'' and [[Tupi language|Tupi]] ''ibá cati'', respectively (two species of [[pineapple]]), and ''tucano'' "[[toucan]]" from [[Guarani language|Guarani]] ''tucan''.@@@@1@31@@danf@17-8-2009 10701340@unknown@formal@none@1@S@See [[List of Brazil state name etymologies]], for some more examples.@@@@1@11@@danf@17-8-2009 10701350@unknown@formal@none@1@S@From the 16th to the 19th century, the role of Portugal as intermediary in the [[Atlantic slave trade]], with the establishment of large Portuguese colonies in Angola, Mozambique, and Brazil, Portuguese got several words of African and [[indigenous peoples of Brazil|Amerind]] origin, especially names for most of the animals and plants found in those territories.@@@@1@55@@danf@17-8-2009 10701360@unknown@formal@none@1@S@While those terms are mostly used in the former colonies, many became current in European Portuguese as well.@@@@1@18@@danf@17-8-2009 10701370@unknown@formal@none@1@S@From [[Kimbundu language|Kimbundu]], for example, came ''kifumate'' → ''cafuné'' "head caress", ''kusula'' → ''caçula'' "youngest child", ''marimbondo'' "tropical wasp", and ''kubungula'' → ''bungular'' "to dance like a wizard".@@@@1@28@@danf@17-8-2009 10701380@unknown@formal@none@1@S@Finally, it has received a steady influx of loanwords from other European languages.@@@@1@13@@danf@17-8-2009 10701390@unknown@formal@none@1@S@For example, ''melena'' "hair lock", ''fiambre'' "wet-cured ham" (in contrast with ''presunto'' "dry-cured ham" from Latin ''prae-exsuctus'' "dehydrated"), and ''castelhano'' "Castilian", from Spanish; ''colchete''/''crochê'' "bracket"/"crochet", ''paletó'' "jacket", ''batom'' "lipstick", and ''filé''/''filete'' "steak"/"slice" respectively, from French ''crochet'', ''paletot'', ''bâton'', ''filet''; ''macarrão'' "pasta", ''piloto'' "pilot", ''carroça'' "carriage", and ''barraca'' "barrack", from Italian ''maccherone'', ''pilota'', ''carrozza'', ''baracca''; and ''bife'' "steak", ''futebol'', ''revólver'', ''estoque'', ''folclore'', from English ''beef'', ''football'', ''revolver'', ''stock'', ''folklore''.@@@@1@68@@danf@17-8-2009 10701400@unknown@formal@none@1@S@==Classification and related languages==@@@@1@4@@danf@17-8-2009 10701410@unknown@formal@none@1@S@Portuguese belongs to the [[West Iberian languages|West Iberian]] branch of the [[Romance language]]s, and it has special ties with the following members of this group:@@@@1@25@@danf@17-8-2009 10701420@unknown@formal@none@1@S@* [[Galician language|Galician]] and the [[Fala language|Fala]], its closest relatives.@@@@1@10@@danf@17-8-2009 10701430@unknown@formal@none@1@S@See below.@@@@1@2@@danf@17-8-2009 10701440@unknown@formal@none@1@S@* [[Spanish language|Spanish]], the major language closest to Portuguese.@@@@1@9@@danf@17-8-2009 10701450@unknown@formal@none@1@S@(See also [[Differences between Spanish and Portuguese]].)@@@@1@7@@danf@17-8-2009 10701460@unknown@formal@none@1@S@* [[Mirandese language|Mirandese]], another West Iberian language spoken in Portugal.@@@@1@10@@danf@17-8-2009 10701470@unknown@formal@none@1@S@* [[Judeo-Portuguese]] and [[Ladino language|Judeo-Spanish]], languages spoken by [[Sephardic Jew]]s, which remained close to Portuguese and Spanish.@@@@1@17@@danf@17-8-2009 10701480@unknown@formal@none@1@S@Despite the obvious lexical and grammatical similarities between Portuguese and other Romance languages, it is not [[mutually intelligible]] with most of them.@@@@1@22@@danf@17-8-2009 10701490@unknown@formal@none@1@S@Apart from Galician, Portuguese speakers will usually need some formal study of basic grammar and vocabulary, before attaining a reasonable level of comprehension of those languages, and vice-versa.@@@@1@28@@danf@17-8-2009 10701500@unknown@formal@none@1@S@===Galician and the Fala===@@@@1@4@@danf@17-8-2009 10701510@unknown@formal@none@1@S@The closest language to Portuguese is Galician, spoken in the autonomous community of Galicia (northwestern Spain).@@@@1@16@@danf@17-8-2009 10701520@unknown@formal@none@1@S@The two were at one time a single language, known today as [[Galician-Portuguese]], but since the political separation of Portugal from Galicia they have diverged somewhat, especially in pronunciation and vocabulary.@@@@1@31@@danf@17-8-2009 10701530@unknown@formal@none@1@S@Nevertheless, the core vocabulary and grammar of Galician are still noticeably closer to Portuguese than to Spanish.@@@@1@17@@danf@17-8-2009 10701540@unknown@formal@none@1@S@In particular, like Portuguese, it uses the future subjunctive, the personal infinitive, and the synthetic pluperfect (see the section on the grammar of Portuguese, below).@@@@1@25@@danf@17-8-2009 10701550@unknown@formal@none@1@S@Mutual intelligibility (estimated at 85% by R. A. Hall, Jr., 1989) is good between Galicians and northern Portuguese, but poorer between Galicians and speakers from central Portugal.@@@@1@27@@danf@17-8-2009 10701560@unknown@formal@none@1@S@The Fala language is another descendant of Galician-Portuguese, spoken by a small number of people in the Spanish towns of Valverdi du Fresnu, As Ellas and Sa Martín de Trebellu (autonomous community of [[Extremadura]], near the border with Portugal).@@@@1@39@@danf@17-8-2009 10701570@unknown@formal@none@1@S@===Influence on other languages===@@@@1@4@@danf@17-8-2009 10701580@unknown@formal@none@1@S@Many languages have [[loanword|borrowed words]] from Portuguese, such as [[Bahasa Indonesia|Indonesian]], [[Sri Lanka]]n [[Sri Lanka Tamils (native)|Tamil]] and [[Sinhalese language|Sinhalese]] (see [[Sri Lanka Indo-Portuguese language|Sri Lanka Indo-Portuguese]]), [[Malay language|Malay]], [[Bengali language|Bengali]], [[English (language)|English]], [[Hindi]], [[Konkani language|Konkani]], [[Marathi language|Marathi]], [[Tetum language|Tetum]], [[Tsonga language|Xitsonga]], [[Papiamentu]], [[Japanese language|Japanese]], [[Barbadian|Bajan Creole]] (Spoken in Barbados), [[Lanc-Patuá]] (spoken in northern Brazil) and [[Sranan Tongo]] (spoken in Suriname).@@@@1@61@@danf@17-8-2009 10701590@unknown@formal@none@1@S@It left a strong influence on the ''[[Old Tupi|língua brasílica]]'', a [[Tupi-Guarani|Tupi-Guarani language]] which was the most widely spoken in [[Brazil]] until the 18th century, and on the language spoken around [[Sikka]] in [[Flores|Flores Island]], [[Indonesia]].@@@@1@36@@danf@17-8-2009 10701600@unknown@formal@none@1@S@In nearby [[Larantuka]], Portuguese is used for prayers in [[Holy Week]] rituals.@@@@1@12@@danf@17-8-2009 10701610@unknown@formal@none@1@S@The Japanese-Portuguese dictionary ''[[Nippo Jisho]]'' (1603) was the first dictionary of Japanese in a European language, a product of [[Society of Jesus|Jesuit]] missionary activity in [[Japan]].@@@@1@26@@danf@17-8-2009 10701620@unknown@formal@none@1@S@Building on the work of earlier Portuguese missionaries, the ''Dictionarium Anamiticum, Lusitanum et Latinum'' (Annamite-Portuguese-Latin dictionary) of [[Alexandre de Rhodes]] (1651) introduced the modern [[Vietnamese alphabet|orthography of Vietnamese]], which is based on the orthography of 17th-century Portuguese.@@@@1@37@@danf@17-8-2009 10701630@unknown@formal@none@1@S@The [[Romanization]] of [[Chinese language|Chinese]] was also influenced by the Portuguese language (among others), particularly regarding [[List of common Chinese surnames|Chinese surnames]]; one example is ''Mei''.@@@@1@26@@danf@17-8-2009 10701640@unknown@formal@none@1@S@See also [[List of English words of Portuguese origin]], [[Loan words in Indonesian]], [[Japanese words of Portuguese origin]], [[Malay_language#Borrowed_words|Borrowed words in Malay]], [[Sinhala words of Portuguese origin]], [[Loan words in Sri Lankan Tamil#Portuguese|Loan words from Portuguese in Sri Lankan Tamil]].@@@@1@40@@danf@17-8-2009 10701650@unknown@formal@none@1@S@===Derived languages===@@@@1@2@@danf@17-8-2009 10701660@unknown@formal@none@1@S@Beginning in the 16th century, the extensive contacts between Portuguese travelers and settlers, African slaves, and local populations led to the appearance of many [[pidgin]]s with varying amounts of Portuguese influence.@@@@1@31@@danf@17-8-2009 10701670@unknown@formal@none@1@S@As these pidgins became the mother tongue of succeeding generations, they evolved into fully fledged [[creole language]]s, which remained in use in many parts of Asia and Africa until the 18th century.@@@@1@32@@danf@17-8-2009 10701680@unknown@formal@none@1@S@Some Portuguese-based or Portuguese-influenced creoles are still spoken today, by over 3 million people worldwide, especially people of partial [[Portuguese people|Portuguese]] ancestry.@@@@1@22@@danf@17-8-2009 10701690@unknown@formal@none@1@S@== Phonology ==@@@@1@3@@danf@17-8-2009 10701700@unknown@formal@none@1@S@There is a maximum of 9 oral vowels and 19 consonants, though some varieties of the language have fewer phonemes (Brazilian Portuguese has only 8 oral vowel [[phone]]s).@@@@1@28@@danf@17-8-2009 10701710@unknown@formal@none@1@S@There are also five nasal vowels, which some linguists regard as allophones of the oral vowels, ten oral [[diphthong]]s, and five nasal diphthongs.@@@@1@23@@danf@17-8-2009 10701720@unknown@formal@none@1@S@===Vowels===@@@@1@1@@danf@17-8-2009 10701730@unknown@formal@none@1@S@To the seven vowels of [[Vulgar Latin]], European Portuguese has added two [[Mid-centralized vowel|near central vowels]], one of which tends to be [[elision|elided]] in [[relaxed pronunciation|rapid speech]], like the ''e caduc'' of [[French language|French]] (represented either as {{IPA|/ɯ̽/}}, or {{IPA|/ɨ/}}, or {{IPA|/ə/}}).@@@@1@42@@danf@17-8-2009 10701740@unknown@formal@none@1@S@The high vowels {{IPA|/e o/}} and the low vowels {{IPA|/ɛ ɔ/}} are four distinct phonemes, and they alternate in various forms of [[apophony]].@@@@1@23@@danf@17-8-2009 10701750@unknown@formal@none@1@S@Like [[Catalan language|Catalan]], Portuguese uses vowel quality to contrast stressed syllables with unstressed syllables: isolated vowels tend to be [[Vowel#Height|raised]], and in some cases centralized, when unstressed.@@@@1@27@@danf@17-8-2009 10701760@unknown@formal@none@1@S@Nasal diphthongs occur mostly at the end of words.@@@@1@9@@danf@17-8-2009 10701770@unknown@formal@none@1@S@===Consonants===@@@@1@1@@danf@17-8-2009 10701780@unknown@formal@none@1@S@The consonant inventory of Portuguese is fairly conservative.@@@@1@8@@danf@17-8-2009 10701790@unknown@formal@none@1@S@The medieval affricates {{IPA|/ts/}}, {{IPA|/dz/}}, {{IPA|/tʃ/}}, {{IPA|/dʒ/}} merged with the fricatives {{IPA|/s/}}, {{IPA|/z/}}, {{IPA|/ʃ/}}, {{IPA|/ʒ/}}, respectively, but not with each other, and there were no other significant changes to the consonant phonemes since then.@@@@1@34@@danf@17-8-2009 10701800@unknown@formal@none@1@S@However, some remarkable dialectal variants and [[allophone]]s have appeared, among which:@@@@1@11@@danf@17-8-2009 10701810@unknown@formal@none@1@S@*In many regions of Brazil, {{IPA|/t/}} and {{IPA|/d/}} have the affricate allophones {{IPA|[tʃ]}} and {{IPA|[dʒ]}}, respectively, before {{IPA|/i/}} and {{IPA|/ĩ/}}.@@@@1@20@@danf@17-8-2009 10701820@unknown@formal@none@1@S@([[Quebec French]] has a similar phenomenon, with alveolar affricates instead of postalveolars.@@@@1@12@@danf@17-8-2009 10701830@unknown@formal@none@1@S@[[Japanese language|Japanese]] is another example).@@@@1@5@@danf@17-8-2009 10701840@unknown@formal@none@1@S@*At the end of a syllable, the phoneme {{IPA|/l/}} has the allophone {{IPA|[u̯]}} in Brazilian Portuguese (''[[L-vocalization#L-vocalization|L-vocalization]]'').@@@@1@17@@danf@17-8-2009 10701850@unknown@formal@none@1@S@*In many parts of Brazil and Angola, intervocalic {{IPA|/ɲ/}} is pronounced as a [[nasalization|nasalized]] [[palatal approximant]] {{IPA|[j̃]}} which nasalizes the preceding vowel, so that for instance {{IPA|/ˈniɲu/}} is pronounced {{IPA|[ˈnĩj̃u]}}.@@@@1@30@@danf@17-8-2009 10701860@unknown@formal@none@1@S@*In most of Brazil, the alveolar sibilants {{IPA|/s/}} and {{IPA|/z/}} occur in complementary distribution at the end of syllables, depending on whether the consonant that follows is voiceless or voiced, as in English.@@@@1@33@@danf@17-8-2009 10701870@unknown@formal@none@1@S@But in most of Portugal and parts of Brazil sibilants are postalveolar at the end of syllables, {{IPA|/ʃ/}} before voiceless consonants, and {{IPA|/ʒ/}} before voiced consonants (in [[Ladino language|Judeo-Spanish]], {{IPA|/s/}} is often replaced with {{IPA|/ʃ/}} at the end of syllables, too).@@@@1@41@@danf@17-8-2009 10701880@unknown@formal@none@1@S@*There is considerable dialectal variation in the value of the [[Rhotic consonant|rhotic]] phoneme {{IPA|/ʁ/}}.@@@@1@14@@danf@17-8-2009 10701890@unknown@formal@none@1@S@See [[Guttural R#Portuguese|Guttural R in Portuguese]], for details.@@@@1@8@@danf@17-8-2009 10701900@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10701910@unknown@formal@none@1@S@A particularly interesting aspect of the grammar of Portuguese is the verb.@@@@1@12@@danf@17-8-2009 10701920@unknown@formal@none@1@S@Morphologically, more verbal inflections from classical Latin have been preserved by Portuguese than any other major Romance language.@@@@1@18@@danf@17-8-2009 10701930@unknown@formal@none@1@S@See [[Romance copula#Morphological comparison|Romance copula]], for a detailed comparison.@@@@1@9@@danf@17-8-2009 10701940@unknown@formal@none@1@S@It has also some innovations not found in other Romance languages (except Galician and the Fala):@@@@1@16@@danf@17-8-2009 10701950@unknown@formal@none@1@S@* The [[present perfect tense]] has an iterative sense unique among the Romance languages.@@@@1@14@@danf@17-8-2009 10701960@unknown@formal@none@1@S@It denotes an action or a series of actions which began in the past and are expected to keep repeating in the future.@@@@1@23@@danf@17-8-2009 10701970@unknown@formal@none@1@S@For instance, the sentence ''Tenho tentado falar com ela'' would be translated to "I have been trying to talk to her", not "I have tried to talk to her".@@@@1@29@@danf@17-8-2009 10701980@unknown@formal@none@1@S@On the other hand, the correct translation of the question "Have you heard the latest news?" is not ''*Tem ouvido a última notícia?'', but ''Ouviu a última notícia?'', since no repetition is implied.@@@@1@33@@danf@17-8-2009 10701990@unknown@formal@none@1@S@* The future [[Subjunctive mood|subjunctive]] tense, which was developed by medieval [[West Iberian languages|West Iberian Romance]], but has now fallen into disuse in Spanish, is still used in [[vernacular]] Portuguese.@@@@1@30@@danf@17-8-2009 10702000@unknown@formal@none@1@S@It appears in dependent clauses that denote a condition which must be fulfilled in the future, so that the independent clause will occur.@@@@1@23@@danf@17-8-2009 10702010@unknown@formal@none@1@S@Other languages normally employ the present tense under the same circumstances:@@@@1@11@@danf@17-8-2009 10702020@unknown@formal@none@1@S@:''Se ''for'' eleito presidente, mudarei a lei.''@@@@1@7@@danf@17-8-2009 10702030@unknown@formal@none@1@S@:If ''I am'' elected president, I will change the law.@@@@1@10@@danf@17-8-2009 10702040@unknown@formal@none@1@S@:''Quando ''fores'' mais velho, vais entender.''@@@@1@6@@danf@17-8-2009 10702050@unknown@formal@none@1@S@:When ''you are'' older, you will understand.@@@@1@7@@danf@17-8-2009 10702060@unknown@formal@none@1@S@* The personal [[infinitive]]: infinitives can [[inflection|inflect]] according to their subject in [[Grammatical person|person]] and [[Grammatical number|number]], often showing who is expected to perform a certain action; cf. ''É melhor voltares'' "It is better [for you] to go back," ''É melhor voltarmos'' "It is better [for us] to go back."@@@@1@50@@danf@17-8-2009 10702070@unknown@formal@none@1@S@Perhaps for this reason, infinitive clauses replace subjunctive clauses more often in Portuguese than in other Romance languages.@@@@1@18@@danf@17-8-2009 10702080@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10702090@unknown@formal@none@1@S@Portuguese is written with the [[Latin alphabet]], making use of five [[diacritic]]s to denote stress, vowel height, contraction, nasalization, and other sound changes (acute accent, grave accent, circumflex accent, tilde, and cedilla).@@@@1@32@@danf@17-8-2009 10702100@unknown@formal@none@1@S@[[Brazilian Portuguese]] also uses the diaeresis mark.@@@@1@7@@danf@17-8-2009 10702110@unknown@formal@none@1@S@Accented characters and digraphs are not counted as separate letters for [[collation]] purposes.@@@@1@13@@danf@17-8-2009 10702120@unknown@formal@none@1@S@===Brazilian vs. European spelling===@@@@1@4@@danf@17-8-2009 10702130@unknown@formal@none@1@S@There are some minor differences between the orthographies of Brazil and other Portuguese language countries.@@@@1@15@@danf@17-8-2009 10702140@unknown@formal@none@1@S@One of the most pervasive is the use of acute accents in the European/African/Asian orthography in many words such as ''sinónimo'', where the Brazilian orthography has a circumflex accent, ''sinônimo''.@@@@1@30@@danf@17-8-2009 10702150@unknown@formal@none@1@S@Another important difference is that Brazilian spelling often lacks ''c'' or ''p'' before ''c'', ''ç'', or ''t'', where the European orthography has them; for example, cf. Brazilian ''fato'' with European ''facto'', "fact", or Brazilian ''objeto'' with European ''objecto'', "object".@@@@1@39@@danf@17-8-2009 10702160@unknown@formal@none@1@S@Some of these spelling differences reflect differences in the pronunciation of the words, but others are merely graphic.@@@@1@18@@danf@17-8-2009 10702170@unknown@formal@none@1@S@==Examples==@@@@1@1@@danf@17-8-2009 10702180@unknown@formal@none@1@S@;Excerpt from the Portuguese [[national epic]] ''[[Os Lusíadas]]'', by author [[Luís de Camões]] (I, 33)@@@@1@15@@danf@17-8-2009 10710010@unknown@formal@none@1@S@
Predictive analytics
@@@@1@2@@danf@17-8-2009 10710020@unknown@formal@none@1@S@'''Predictive analytics''' encompasses a variety of techniques from [[statistics]] and [[data mining]] that analyze current and historical data to make predictions about future events.@@@@1@24@@danf@17-8-2009 10710030@unknown@formal@none@1@S@Such predictions rarely take the form of absolute statements, and are more likely to be expressed as values that correspond to the odds of a particular event or behavior taking place in the future.@@@@1@34@@danf@17-8-2009 10710040@unknown@formal@none@1@S@In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities.@@@@1@17@@danf@17-8-2009 10710050@unknown@formal@none@1@S@Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.@@@@1@26@@danf@17-8-2009 10710060@unknown@formal@none@1@S@One of the most well-known applications is [[credit scoring]], which is used throughout [[financial services]].@@@@1@15@@danf@17-8-2009 10710070@unknown@formal@none@1@S@Scoring models process a customer’s [[credit history]], [[loan application]], customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.@@@@1@27@@danf@17-8-2009 10710080@unknown@formal@none@1@S@Predictive analytics are also used in [[insurance]], [[telecommunications]], [[retail]], [[travel]], [[healthcare]], [[Pharmaceutical company|pharmaceuticals]] and other fields.@@@@1@16@@danf@17-8-2009 10710090@unknown@formal@none@1@S@== Types of predictive analytics ==@@@@1@6@@danf@17-8-2009 10710100@unknown@formal@none@1@S@Generally, predictive analytics is used to mean [[predictive modeling]], scoring of predictive models, and [[forecasting]].@@@@1@15@@danf@17-8-2009 10710110@unknown@formal@none@1@S@However, people are increasingly using the term to describe related analytic disciplines, such as descriptive modeling and decision modeling or optimization.@@@@1@21@@danf@17-8-2009 10710120@unknown@formal@none@1@S@These disciplines also involve rigorous data analysis, and are widely used in business for segmentation and decision making, but have different purposes and the statistical techniques underlying them vary.@@@@1@29@@danf@17-8-2009 10710130@unknown@formal@none@1@S@===Predictive models===@@@@1@2@@danf@17-8-2009 10710140@unknown@formal@none@1@S@Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future in order to improve [[marketing effectiveness]].@@@@1@26@@danf@17-8-2009 10710150@unknown@formal@none@1@S@This category also encompasses models that seek out subtle data patterns to answer questions about customer performance, such as fraud detection models.@@@@1@22@@danf@17-8-2009 10710160@unknown@formal@none@1@S@Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision.@@@@1@28@@danf@17-8-2009 10710170@unknown@formal@none@1@S@===Descriptive models===@@@@1@2@@danf@17-8-2009 10710180@unknown@formal@none@1@S@Descriptive models “describe” relationships in data in a way that is often used to classify customers or prospects into groups.@@@@1@20@@danf@17-8-2009 10710190@unknown@formal@none@1@S@Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products.@@@@1@25@@danf@17-8-2009 10710200@unknown@formal@none@1@S@But the descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do.@@@@1@21@@danf@17-8-2009 10710210@unknown@formal@none@1@S@Descriptive models are often used “offline,” for example, to categorize customers by their product preferences and life stage.@@@@1@18@@danf@17-8-2009 10710220@unknown@formal@none@1@S@Descriptive modeling tools can be utilized to develop agent based models that can simulate large number of individualized agents to predict possible futures.@@@@1@23@@danf@17-8-2009 10710230@unknown@formal@none@1@S@===Decision models===@@@@1@2@@danf@17-8-2009 10710240@unknown@formal@none@1@S@Decision models describe the relationship between all the elements of a decision — the known data (including results of predictive models), the decision and the forecast results of the decision — in order to predict the results of decisions involving many variables.@@@@1@42@@danf@17-8-2009 10710250@unknown@formal@none@1@S@These models can be used in optimization, a data-driven approach to improving decision logic that involves maximizing certain outcomes while minimizing others.@@@@1@22@@danf@17-8-2009 10710260@unknown@formal@none@1@S@Decision models are generally used offline, to develop decision logic or a set of business rules that will produce the desired action for every customer or circumstance.@@@@1@27@@danf@17-8-2009 10710270@unknown@formal@none@1@S@== Predictive analytics ==@@@@1@4@@danf@17-8-2009 10710280@unknown@formal@none@1@S@===Definition===@@@@1@1@@danf@17-8-2009 10710290@unknown@formal@none@1@S@Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.@@@@1@25@@danf@17-8-2009 10710300@unknown@formal@none@1@S@The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes.@@@@1@26@@danf@17-8-2009 10710310@unknown@formal@none@1@S@===Current uses===@@@@1@2@@danf@17-8-2009 10710320@unknown@formal@none@1@S@Although predictive analytics can be put to use in many applications, we outline a few examples where predictive analytics has shown positive impact in recent years.@@@@1@26@@danf@17-8-2009 10710330@unknown@formal@none@1@S@====Analytical Customer Relationship Management (CRM)====@@@@1@5@@danf@17-8-2009 10710340@unknown@formal@none@1@S@Analytical [[Customer Relationship Management]] is a frequent commercial application of Predictive Analysis.@@@@1@12@@danf@17-8-2009 10710350@unknown@formal@none@1@S@Methods of predictive analysis are applied to customer data to pursue CRM objectives.@@@@1@13@@danf@17-8-2009 10710360@unknown@formal@none@1@S@====Direct marketing====@@@@1@2@@danf@17-8-2009 10710370@unknown@formal@none@1@S@Product [[marketing]] is constantly faced with the challenge of coping with the increasing number of competing products, different consumer preferences and the variety of methods (channels) available to interact with each consumer.@@@@1@32@@danf@17-8-2009 10710380@unknown@formal@none@1@S@Efficient marketing is a process of understanding the amount of variability and tailoring the marketing strategy for greater profitability.@@@@1@19@@danf@17-8-2009 10710390@unknown@formal@none@1@S@Predictive analytics can help identify consumers with a higher likelihood of responding to a particular marketing offer.@@@@1@17@@danf@17-8-2009 10710400@unknown@formal@none@1@S@Models can be built using data from consumers’ past purchasing history and past response rates for each channel.@@@@1@18@@danf@17-8-2009 10710410@unknown@formal@none@1@S@Additional information about the consumers demographic, geographic and other characteristics can be used to make more accurate predictions.@@@@1@18@@danf@17-8-2009 10710420@unknown@formal@none@1@S@Targeting only these consumers can lead to substantial increase in response rate which can lead to a significant reduction in cost per acquisition.@@@@1@23@@danf@17-8-2009 10710430@unknown@formal@none@1@S@Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of products and marketing channels that should be used to target a given consumer.@@@@1@29@@danf@17-8-2009 10710440@unknown@formal@none@1@S@====Cross-sell====@@@@1@1@@danf@17-8-2009 10710450@unknown@formal@none@1@S@Often corporate organizations collect and maintain abundant data (e.g. customer records, sale transactions) and exploiting hidden relationships in the data can provide a competitive advantage to the organization.@@@@1@28@@danf@17-8-2009 10710460@unknown@formal@none@1@S@For an organization that offers multiple products, an analysis of existing customer behavior can lead to efficient [[cross-selling|cross sell]] of products.@@@@1@21@@danf@17-8-2009 10710470@unknown@formal@none@1@S@This directly leads to higher profitability per customer and strengthening of the customer relationship.@@@@1@14@@danf@17-8-2009 10710480@unknown@formal@none@1@S@Predictive analytics can help analyze customers’ spending, usage and other behavior, and help cross-sell the right product at the right time.@@@@1@21@@danf@17-8-2009 10710490@unknown@formal@none@1@S@====Customer retention====@@@@1@2@@danf@17-8-2009 10710500@unknown@formal@none@1@S@With the amount of competing services available, businesses need to focus efforts on maintaining continuous [[consumer satisfaction]].@@@@1@17@@danf@17-8-2009 10710510@unknown@formal@none@1@S@In such a competitive scenario, [[consumer loyalty]] needs to be rewarded and [[customer attrition]] needs to be minimized.@@@@1@18@@danf@17-8-2009 10710520@unknown@formal@none@1@S@Businesses tend to respond to customer attrition on a reactive basis, acting only after the customer has initiated the process to terminate service.@@@@1@23@@danf@17-8-2009 10710530@unknown@formal@none@1@S@At this stage, the chance of changing the customer’s decision is almost impossible.@@@@1@13@@danf@17-8-2009 10710540@unknown@formal@none@1@S@Proper application of predictive analytics can lead to a more proactive retention strategy.@@@@1@13@@danf@17-8-2009 10710550@unknown@formal@none@1@S@By a frequent examination of a customer’s past service usage, service performance, spending and other behavior patterns, predictive models can determine the likelihood of a customer wanting to terminate service sometime in the near future.@@@@1@35@@danf@17-8-2009 10710560@unknown@formal@none@1@S@An intervention with lucrative offers can increase the chance of retaining the customer.@@@@1@13@@danf@17-8-2009 10710570@unknown@formal@none@1@S@Silent attrition is the behavior of a customer to slowly but steadily reduce usage and is another problem faced by many companies.@@@@1@22@@danf@17-8-2009 10710580@unknown@formal@none@1@S@Predictive analytics can also predict this behavior accurately and before it occurs, so that the company can take proper actions to increase customer activity.@@@@1@24@@danf@17-8-2009 10710590@unknown@formal@none@1@S@====Underwriting====@@@@1@1@@danf@17-8-2009 10710600@unknown@formal@none@1@S@Many businesses have to account for risk exposure due to their different services and determine the cost needed to cover the risk.@@@@1@22@@danf@17-8-2009 10710610@unknown@formal@none@1@S@For example, auto insurance providers need to accurately determine the amount of premium to charge to cover each automobile and driver.@@@@1@21@@danf@17-8-2009 10710620@unknown@formal@none@1@S@A financial company needs to assess a borrower’s potential and ability to pay before granting a loan.@@@@1@17@@danf@17-8-2009 10710630@unknown@formal@none@1@S@For a health insurance provider, predictive analytics can analyze a few years of past medical claims data, as well as lab, pharmacy and other records where available, to predict how expensive an enrollee is likely to be in the future.@@@@1@40@@danf@17-8-2009 10710640@unknown@formal@none@1@S@Predictive analytics can help [[underwriting]] of these quantities by predicting the chances of illness, [[Default (finance)|default]], [[bankruptcy]], etc.@@@@1@18@@danf@17-8-2009 10710650@unknown@formal@none@1@S@Predictive analytics can streamline the process of customer acquisition, by predicting the future risk behavior of a customer using application level data.@@@@1@22@@danf@17-8-2009 10710660@unknown@formal@none@1@S@Proper predictive analytics can lead to proper pricing decisions, which can help mitigate future risk of default.@@@@1@17@@danf@17-8-2009 10710670@unknown@formal@none@1@S@====Collection analytics====@@@@1@2@@danf@17-8-2009 10710680@unknown@formal@none@1@S@Every portfolio has a set of delinquent customers who do not make their payments on time.@@@@1@16@@danf@17-8-2009 10710690@unknown@formal@none@1@S@The financial institution has to undertake collection activities on these customers to recover the amounts due.@@@@1@16@@danf@17-8-2009 10710700@unknown@formal@none@1@S@A lot of collection resources are wasted on customers who are difficult or impossible to recover.@@@@1@16@@danf@17-8-2009 10710710@unknown@formal@none@1@S@Predictive analytics can help optimize the allocation of collection resources by identifying the most effective collection agencies, contact strategies, legal actions and other strategies to each customer, thus significantly increasing recovery at the same time reducing collection costs.@@@@1@38@@danf@17-8-2009 10710720@unknown@formal@none@1@S@====Fraud detection====@@@@1@2@@danf@17-8-2009 10710730@unknown@formal@none@1@S@Fraud is a big problem for many businesses and can be of various types.@@@@1@14@@danf@17-8-2009 10710740@unknown@formal@none@1@S@Inaccurate credit applications, fraudulent transactions, [[identity theft]]s and false insurance claims are some examples of this problem.@@@@1@17@@danf@17-8-2009 10710750@unknown@formal@none@1@S@These problems plague firms all across the spectrum and some examples of likely victims are [[Credit card fraud|credit card issuers]], insurance companies, retail merchants, manufacturers, business to business suppliers and even services providers.@@@@1@33@@danf@17-8-2009 10710760@unknown@formal@none@1@S@This is an area where a predictive model is often used to help weed out the “bads” and reduce a business's exposure to fraud.@@@@1@24@@danf@17-8-2009 10710770@unknown@formal@none@1@S@====Portfolio, product or economy level prediction====@@@@1@6@@danf@17-8-2009 10710780@unknown@formal@none@1@S@Often the focus of analysis is not the consumer but the product, portfolio, firm, industry or even the economy.@@@@1@19@@danf@17-8-2009 10710790@unknown@formal@none@1@S@For example a retailer might be interested in predicting store level demand for inventory management purposes.@@@@1@16@@danf@17-8-2009 10710800@unknown@formal@none@1@S@Or the Federal Reserve Board might be interested in predicting the unemployment rate for the next year.@@@@1@17@@danf@17-8-2009 10710810@unknown@formal@none@1@S@These type of problems can be addressed by predictive analytics using Time Series techniques (see below).@@@@1@16@@danf@17-8-2009 10710820@unknown@formal@none@1@S@Wrong Information....@@@@1@2@@danf@17-8-2009 10710830@unknown@formal@none@1@S@==Statistical techniques==@@@@1@2@@danf@17-8-2009 10710840@unknown@formal@none@1@S@The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques.@@@@1@20@@danf@17-8-2009 10710850@unknown@formal@none@1@S@====Regression Techniques====@@@@1@2@@danf@17-8-2009 10710860@unknown@formal@none@1@S@Regression models are the mainstay of predictive analytics.@@@@1@8@@danf@17-8-2009 10710870@unknown@formal@none@1@S@The focus lies on establishing a mathematical equation as a model to represent the interactions between the different variables in consideration.@@@@1@21@@danf@17-8-2009 10710880@unknown@formal@none@1@S@Depending on the situation, there is a wide variety of models that can be applied while performing predictive analytics.@@@@1@19@@danf@17-8-2009 10710890@unknown@formal@none@1@S@Some of them are briefly discussed below.@@@@1@7@@danf@17-8-2009 10710900@unknown@formal@none@1@S@=====Linear Regression Model=====@@@@1@3@@danf@17-8-2009 10710910@unknown@formal@none@1@S@The linear regression model analyzes the relationship between the response or dependent variable and a set of independent or predictor variables.@@@@1@21@@danf@17-8-2009 10710920@unknown@formal@none@1@S@This relationship is expressed as an equation that predicts the response variable as a linear function of the parameters.@@@@1@19@@danf@17-8-2009 10710930@unknown@formal@none@1@S@These parameters are adjusted so that a measure of fit is optimized.@@@@1@12@@danf@17-8-2009 10710940@unknown@formal@none@1@S@Much of the effort in model fitting is focused on minimizing the size of the residual, as well as ensuring that it is randomly distributed with respect to the model predictions.@@@@1@31@@danf@17-8-2009 10710950@unknown@formal@none@1@S@The goal of regression is to select the parameters of the model so as to minimize the sum of the squared residuals.@@@@1@22@@danf@17-8-2009 10710960@unknown@formal@none@1@S@This is referred to as '''[[ordinary least squares]]''' (OLS) estimation and results in best linear unbiased estimates (BLUE) of the parameters if and only if the [[Gauss–Markov theorem|Gauss-Markowitz]] assumptions are satisfied.@@@@1@31@@danf@17-8-2009 10710970@unknown@formal@none@1@S@Once the model has been estimated we would be interested to know if the predictor variables belong in the model – i.e. is the estimate of each variable’s contribution reliable?@@@@1@30@@danf@17-8-2009 10710980@unknown@formal@none@1@S@To do this we can check the statistical significance of the model’s coefficients which can be measured using the t-statistic.@@@@1@20@@danf@17-8-2009 10710990@unknown@formal@none@1@S@This amounts to testing whether the coefficient is significantly different from zero.@@@@1@12@@danf@17-8-2009 10711000@unknown@formal@none@1@S@How well the model predicts the dependent variable based on the value of the independent variables can be assessed by using the R² statistic.@@@@1@24@@danf@17-8-2009 10711010@unknown@formal@none@1@S@It measures predictive power of the model i.e. the proportion of the total variation in the dependent variable that is “explained” (accounted for) by variation in the independent variables.@@@@1@29@@danf@17-8-2009 10711020@unknown@formal@none@1@S@====Discrete choice models====@@@@1@3@@danf@17-8-2009 10711030@unknown@formal@none@1@S@Multivariate regression (above) is generally used when the response variable is continuous and has an unbounded range.@@@@1@17@@danf@17-8-2009 10711040@unknown@formal@none@1@S@Often the response variable may not be continuous but rather discrete.@@@@1@11@@danf@17-8-2009 10711050@unknown@formal@none@1@S@While mathematically it is feasible to apply multivariate regression to discrete ordered dependent variables, some of the assumptions behind the theory of multivariate linear regression no longer hold, and there are other techniques such as discrete choice models which are better suited for this type of analysis.@@@@1@47@@danf@17-8-2009 10711060@unknown@formal@none@1@S@If the dependent variable is discrete, some of those superior methods are [[logistic regression]], [[multinomial logit]] and [[probit]] models.@@@@1@19@@danf@17-8-2009 10711070@unknown@formal@none@1@S@Logistic regression and probit models are used when the dependent variable is [[binary numeral system|binary]].@@@@1@15@@danf@17-8-2009 10711080@unknown@formal@none@1@S@=====Logistic regression=====@@@@1@2@@danf@17-8-2009 10711090@unknown@formal@none@1@S@In a classification setting, assigning outcome probabilities to observations can be achieved through the use of a logistic model, which is basically a method which transforms information about the binary dependent variable into an unbounded continuous variable and estimates a regular multivariate model (See Allison’s Logistic Regression for more information on the theory of Logistic Regression).@@@@1@56@@danf@17-8-2009 10711100@unknown@formal@none@1@S@The [[Wald test|Wald]] and [[likelihood-ratio test]] are used to test the statistical significance of each coefficient b in the model (analogous to the t tests used in OLS regression; see above).@@@@1@31@@danf@17-8-2009 10711110@unknown@formal@none@1@S@A test assessing the goodness-of-fit of a classification model is the [[Hosmer and Lemeshow test]].@@@@1@15@@danf@17-8-2009 10711120@unknown@formal@none@1@S@=====Multinomial logistic regression=====@@@@1@3@@danf@17-8-2009 10711130@unknown@formal@none@1@S@An extension of the [[binary logit model]] to cases where the dependent variable has more than 2 categories is the [[multinomial logit model]].@@@@1@23@@danf@17-8-2009 10711140@unknown@formal@none@1@S@In such cases collapsing the data into two categories might not make good sense or may lead to loss in the richness of the data.@@@@1@25@@danf@17-8-2009 10711150@unknown@formal@none@1@S@The multinomial logit model is the appropriate technique in these cases, especially when the dependent variable categories are not ordered (for examples colors like red, blue, green).@@@@1@27@@danf@17-8-2009 10711160@unknown@formal@none@1@S@Some authors have extended multinomial regression to include feature selection/importance methods such as [[Random multinomial logit]].@@@@1@16@@danf@17-8-2009 10711170@unknown@formal@none@1@S@=====Probit regression=====@@@@1@2@@danf@17-8-2009 10711180@unknown@formal@none@1@S@Probit models offer an alternative to logistic regression for modeling categorical dependent variables.@@@@1@13@@danf@17-8-2009 10711190@unknown@formal@none@1@S@Even though the outcomes tend to be similar, the underlying distributions are different.@@@@1@13@@danf@17-8-2009 10711200@unknown@formal@none@1@S@Probit models are popular in social sciences like economics.@@@@1@9@@danf@17-8-2009 10711210@unknown@formal@none@1@S@A good way to understand the key difference between probit and logit models, is to assume that there is a latent variable z.@@@@1@23@@danf@17-8-2009 10711220@unknown@formal@none@1@S@We do not observe z but instead observe y which takes the value 0 or 1.@@@@1@16@@danf@17-8-2009 10711230@unknown@formal@none@1@S@In the logit model we assume that follows a logistic distribution.@@@@1@11@@danf@17-8-2009 10711240@unknown@formal@none@1@S@In the probit model we assume that follows a standard normal distribution.@@@@1@12@@danf@17-8-2009 10711250@unknown@formal@none@1@S@Note that in social sciences (example economics), probit is often used to model situations where the observed variable y is continuous but takes values between 0 and 1.@@@@1@28@@danf@17-8-2009 10711260@unknown@formal@none@1@S@=====Logit vs. Probit=====@@@@1@3@@danf@17-8-2009 10711270@unknown@formal@none@1@S@The Probit model has been around longer than the logit model.@@@@1@11@@danf@17-8-2009 10711280@unknown@formal@none@1@S@They look identical, except that the logistic distribution tends to be a little flat tailed.@@@@1@15@@danf@17-8-2009 10711290@unknown@formal@none@1@S@In fact one of the reasons the logit model was formulated was that the probit model was extremely hard to compute because it involved calculating difficult integrals.@@@@1@27@@danf@17-8-2009 10711300@unknown@formal@none@1@S@Modern computing however has made this computation fairly simple.@@@@1@9@@danf@17-8-2009 10711310@unknown@formal@none@1@S@The coefficients obtained from the logit and probit model are also fairly close.@@@@1@13@@danf@17-8-2009 10711320@unknown@formal@none@1@S@However the odds ratio makes the logit model easier to interpret.@@@@1@11@@danf@17-8-2009 10711330@unknown@formal@none@1@S@For practical purposes the only reasons for choosing the probit model over the logistic model would be:@@@@1@17@@danf@17-8-2009 10711340@unknown@formal@none@1@S@* There is a strong belief that the underlying distribution is normal@@@@1@12@@danf@17-8-2009 10711350@unknown@formal@none@1@S@* The actual event is not a binary outcome (e.g. Bankrupt/not bankrupt) but a proportion (e.g. Proportion of population at different debt levels).@@@@1@23@@danf@17-8-2009 10711360@unknown@formal@none@1@S@==== Time series models====@@@@1@4@@danf@17-8-2009 10711370@unknown@formal@none@1@S@[[Time series]] models are used for predicting or forecasting the future behavior of variables.@@@@1@14@@danf@17-8-2009 10711380@unknown@formal@none@1@S@These models account for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for.@@@@1@29@@danf@17-8-2009 10711390@unknown@formal@none@1@S@As a result standard regression techniques cannot be applied to time series data and methodology has been developed to decompose the trend, seasonal and cyclical component of the series.@@@@1@29@@danf@17-8-2009 10711400@unknown@formal@none@1@S@Modeling the dynamic path of a variable can improve forecasts since the predictable component of the series can be projected into the future.@@@@1@23@@danf@17-8-2009 10711410@unknown@formal@none@1@S@Time series models estimate difference equations containing stochastic components.@@@@1@9@@danf@17-8-2009 10711420@unknown@formal@none@1@S@Two commonly used forms of these models are [[autoregressive model]]s (AR) and [[Moving average (technical analysis)|moving average]] (MA) models.@@@@1@19@@danf@17-8-2009 10711430@unknown@formal@none@1@S@The [[Box-Jenkins]] methodology (1976) developed by George Box and G.M. Jenkins combines the AR and MA models to produce the [[Autoregressive moving average model|ARMA]] (autoregressive moving average) model which is the cornerstone of stationary time series analysis.@@@@1@37@@danf@17-8-2009 10711440@unknown@formal@none@1@S@ARIMA (autoregressive integrated moving average models) on the other hand are used to describe non-stationary time series.@@@@1@17@@danf@17-8-2009 10711450@unknown@formal@none@1@S@Box and Jenkins suggest differencing a non stationary time series to obtain a stationary series to which an ARMA model can be applied.@@@@1@23@@danf@17-8-2009 10711460@unknown@formal@none@1@S@Non stationary time series have a pronounced trend and do not have a constant long-run mean or variance.@@@@1@18@@danf@17-8-2009 10711470@unknown@formal@none@1@S@Box and Jenkins proposed a three stage methodology which includes: model identification, estimation and validation.@@@@1@15@@danf@17-8-2009 10711480@unknown@formal@none@1@S@The identification stage involves identifying if the series is stationary or not and the presence of seasonality by examining plots of the series, autocorrelation and partial autocorrelation functions.@@@@1@28@@danf@17-8-2009 10711490@unknown@formal@none@1@S@In the estimation stage, models are estimated using non-linear time series or maximum likelihood estimation procedures.@@@@1@16@@danf@17-8-2009 10711500@unknown@formal@none@1@S@Finally the validation stage involves diagnostic checking such as plotting the residuals to detect outliers and evidence of model fit.@@@@1@20@@danf@17-8-2009 10711510@unknown@formal@none@1@S@In recent years time series models have become more sophisticated and attempt to model conditional heteroskedasticity with models such as ARCH ([[autoregressive conditional heteroskedasticity]]) and GARCH (generalized autoregressive conditional heteroskedasticity) models frequently used for financial time series.@@@@1@37@@danf@17-8-2009 10711520@unknown@formal@none@1@S@In addition time series models are also used to understand inter-relationships among economic variables represented by systems of equations using VAR (vector autoregression) and structural VAR models.@@@@1@27@@danf@17-8-2009 10711530@unknown@formal@none@1@S@==== Survival or duration analysis====@@@@1@5@@danf@17-8-2009 10711540@unknown@formal@none@1@S@[[Survival analysis]] is another name for time to event analysis.@@@@1@10@@danf@17-8-2009 10711550@unknown@formal@none@1@S@These techniques were primarily developed in the medical and biological sciences, but they are also widely used in the social sciences like economics, as well as in engineering (reliability and failure time analysis).@@@@1@33@@danf@17-8-2009 10711560@unknown@formal@none@1@S@Censoring and non-normality which are characteristic of survival data generate difficulty when trying to analyze the data using conventional statistical models such as multiple linear regression.@@@@1@26@@danf@17-8-2009 10711570@unknown@formal@none@1@S@The Normal distribution, being a symmetric distribution, takes positive as well as negative values, but duration by its very nature cannot be negative and therefore normality cannot be assumed when dealing with duration/survival data.@@@@1@34@@danf@17-8-2009 10711580@unknown@formal@none@1@S@Hence the normality assumption of regression models is violated.@@@@1@9@@danf@17-8-2009 10711590@unknown@formal@none@1@S@A censored observation is defined as an observation with incomplete information.@@@@1@11@@danf@17-8-2009 10711600@unknown@formal@none@1@S@Censoring introduces distortions into traditional statistical methods and is essentially a defect of the sample data.@@@@1@16@@danf@17-8-2009 10711610@unknown@formal@none@1@S@The assumption is that if the data were not censored it would be representative of the population of interest.@@@@1@19@@danf@17-8-2009 10711620@unknown@formal@none@1@S@In survival analysis, censored observations arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time.@@@@1@29@@danf@17-8-2009 10711630@unknown@formal@none@1@S@An important concept in survival analysis is the hazard rate.@@@@1@10@@danf@17-8-2009 10711640@unknown@formal@none@1@S@The hazard rate is defined as the probability that the event will occur at time t conditional on surviving until time t.@@@@1@22@@danf@17-8-2009 10711650@unknown@formal@none@1@S@Another concept related to the hazard rate is the survival function which can be defined as the probability of surviving to time t.@@@@1@23@@danf@17-8-2009 10711660@unknown@formal@none@1@S@Most models try to model the hazard rate by choosing the underlying distribution depending on the shape of the hazard function.@@@@1@21@@danf@17-8-2009 10711670@unknown@formal@none@1@S@A distribution whose hazard function slopes upward is said to have positive duration dependence, a decreasing hazard shows negative duration dependence whereas constant hazard is a process with no memory usually characterized by the exponential distribution.@@@@1@36@@danf@17-8-2009 10711680@unknown@formal@none@1@S@Some of the distributional choices in survival models are: F, gamma, Weibull, log normal, inverse normal, exponential etc.@@@@1@18@@danf@17-8-2009 10711690@unknown@formal@none@1@S@All these distributions are for a non-negative random variable.@@@@1@9@@danf@17-8-2009 10711700@unknown@formal@none@1@S@Duration models can be parametric, non-parametric or semi-parametric.@@@@1@8@@danf@17-8-2009 10711710@unknown@formal@none@1@S@Some of the models commonly used are Kaplan-Meier, Cox proportional hazard model (non parametric).@@@@1@14@@danf@17-8-2009 10711720@unknown@formal@none@1@S@==== Classification and regression trees====@@@@1@5@@danf@17-8-2009 10711730@unknown@formal@none@1@S@Classification and regression trees (CART) is a [[non-parametric statistics|non-parametric]] technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.@@@@1@28@@danf@17-8-2009 10711740@unknown@formal@none@1@S@Trees are formed by a collection of rules based on values of certain variables in the modeling data set@@@@1@19@@danf@17-8-2009 10711750@unknown@formal@none@1@S@* Rules are selected based on how well splits based on variables’ values can differentiate observations based on the dependent variable@@@@1@21@@danf@17-8-2009 10711760@unknown@formal@none@1@S@* Once a rule is selected and splits a node into two, the same logic is applied to each “child” node (i.e. it is a recursive procedure)@@@@1@27@@danf@17-8-2009 10711770@unknown@formal@none@1@S@* Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met@@@@1@19@@danf@17-8-2009 10711780@unknown@formal@none@1@S@Each branch of the tree ends in a terminal node@@@@1@10@@danf@17-8-2009 10711790@unknown@formal@none@1@S@* Each observation falls into one and exactly one terminal node@@@@1@11@@danf@17-8-2009 10711800@unknown@formal@none@1@S@* Each terminal node is uniquely defined by a set of rules@@@@1@12@@danf@17-8-2009 10711810@unknown@formal@none@1@S@A very popular method for predictive analytics is Leo Breiman's [[Random forests]] or derived versions of this technique like [[Random multinomial logit]].@@@@1@22@@danf@17-8-2009 10711820@unknown@formal@none@1@S@==== Multivariate adaptive regression splines====@@@@1@5@@danf@17-8-2009 10711830@unknown@formal@none@1@S@[[Multivariate adaptive regression splines]] (MARS) is a [[Non-parametric statistics|non-parametric]] technique that builds flexible models by fitting [[piecewise linear regression]]s.@@@@1@19@@danf@17-8-2009 10711840@unknown@formal@none@1@S@An important concept associated with regression splines is that of a knot.@@@@1@12@@danf@17-8-2009 10711850@unknown@formal@none@1@S@Knot is where one local regression model gives way to another and thus is the point of intersection between two splines.@@@@1@21@@danf@17-8-2009 10711860@unknown@formal@none@1@S@In multivariate and adaptive regression splines, [[basis function]]s are the tool used for generalizing the search for knots.@@@@1@18@@danf@17-8-2009 10711870@unknown@formal@none@1@S@Basis functions are a set of functions used to represent the information contained in one or more variables.@@@@1@18@@danf@17-8-2009 10711880@unknown@formal@none@1@S@Multivariate and Adaptive Regression Splines model almost always creates the basis functions in pairs.@@@@1@14@@danf@17-8-2009 10711890@unknown@formal@none@1@S@Multivariate and adaptive regression spline approach deliberately overfits the model and then prunes to get to the optimal model.@@@@1@19@@danf@17-8-2009 10711900@unknown@formal@none@1@S@The algorithm is computationally very intensive and in practice we are required to specify an upper limit on the number of basis functions.@@@@1@23@@danf@17-8-2009 10711910@unknown@formal@none@1@S@=== Machine learning techniques===@@@@1@4@@danf@17-8-2009 10711920@unknown@formal@none@1@S@[[Machine learning]], a branch of artificial intelligence, was originally employed to develop techniques to enable computers to learn.@@@@1@18@@danf@17-8-2009 10711930@unknown@formal@none@1@S@Today, since it includes a number of advanced statistical methods for regression and classification, it finds application in a wide variety of fields including [[medical diagnostics]], [[credit card fraud detection]], [[Face recognition|face]] and [[speech recognition]] and analysis of the [[stock market]].@@@@1@41@@danf@17-8-2009 10711940@unknown@formal@none@1@S@In certain applications it is sufficient to directly predict the dependent variable without focusing on the underlying relationships between variables.@@@@1@20@@danf@17-8-2009 10711950@unknown@formal@none@1@S@In other cases, the underlying relationships can be very complex and the mathematical form of the dependencies unknown.@@@@1@18@@danf@17-8-2009 10711960@unknown@formal@none@1@S@For such cases, machine learning techniques emulate [[human cognition]] and learn from training examples to predict future events.@@@@1@18@@danf@17-8-2009 10711970@unknown@formal@none@1@S@A brief discussion of some of these methods used commonly for predictive analytics is provided below.@@@@1@16@@danf@17-8-2009 10711980@unknown@formal@none@1@S@A detailed study of machine learning can be found in Mitchell (1997).@@@@1@12@@danf@17-8-2009 10711990@unknown@formal@none@1@S@==== Neural networks====@@@@1@3@@danf@17-8-2009 10712000@unknown@formal@none@1@S@[[Neural networks]] are [[Nonlinearity|nonlinear]] sophisticated modeling techniques that are able to [[Model (abstract)|model]] complex functions.@@@@1@15@@danf@17-8-2009 10712010@unknown@formal@none@1@S@They can be applied to problems of [[Time series|prediction]], [[Statistical classification|classification]] or [[Control theory|control]] in a wide spectrum of fields such as [[finance]], [[cognitive psychology]]/[[cognitive neuroscience|neuroscience]], [[medicine]], [[engineering]], and [[physics]].@@@@1@30@@danf@17-8-2009 10712020@unknown@formal@none@1@S@Neural networks are used when the exact nature of the relationship between inputs and output is not known.@@@@1@18@@danf@17-8-2009 10712030@unknown@formal@none@1@S@A key feature of neural networks is that they learn the relationship between inputs and output through training.@@@@1@18@@danf@17-8-2009 10712040@unknown@formal@none@1@S@There are two types of training in neural networks used by different networks, [[Supervised learning|supervised]] and [[Unsupervised learning|unsupervised]] training, with supervised being the most common one.@@@@1@26@@danf@17-8-2009 10712050@unknown@formal@none@1@S@Some examples of neural network training techniques are [[backpropagation]], quick propagation, [[Conjugate gradient method|conjugate gradient descent]], [[Radial basis function|projection operator]], Delta-Bar-Delta etc.@@@@1@22@@danf@17-8-2009 10712060@unknown@formal@none@1@S@Theses are applied to network architectures such as multilayer [[perceptron]]s, [[Self-organizing map|Kohonen network]]s, [[Hopfield network]]s, etc.@@@@1@16@@danf@17-8-2009 10712070@unknown@formal@none@1@S@====Radial basis functions====@@@@1@3@@danf@17-8-2009 10712080@unknown@formal@none@1@S@A [[radial basis function]] (RBF) is a function which has built into it a distance criterion with respect to a center.@@@@1@21@@danf@17-8-2009 10712090@unknown@formal@none@1@S@Such functions can be used very efficiently for interpolation and for smoothing of data.@@@@1@14@@danf@17-8-2009 10712100@unknown@formal@none@1@S@Radial basis functions have been applied in the area of [[neural network]]s where they are used as a replacement for the sigmoidal transfer function.@@@@1@24@@danf@17-8-2009 10712110@unknown@formal@none@1@S@Such networks have 3 layers, the input layer, the hidden layer with the RBF non-linearity and a linear output layer.@@@@1@20@@danf@17-8-2009 10712120@unknown@formal@none@1@S@The most popular choice for the non-linearity is the Gaussian.@@@@1@10@@danf@17-8-2009 10712130@unknown@formal@none@1@S@RBF networks have the advantage of not being locked into local minima as do the [[feed-forward]] networks such as the multilayer perceptron.@@@@1@22@@danf@17-8-2009 10712140@unknown@formal@none@1@S@==== Support vector machines====@@@@1@4@@danf@17-8-2009 10712150@unknown@formal@none@1@S@[[Support Vector Machine]]s (SVM) are used to detect and exploit complex patterns in data by clustering, classifying and ranking the data.@@@@1@21@@danf@17-8-2009 10712160@unknown@formal@none@1@S@They are learning machines that are used to perform binary classifications and regression estimations.@@@@1@14@@danf@17-8-2009 10712170@unknown@formal@none@1@S@They commonly use kernel based methods to apply linear classification techniques to non-linear classification problems.@@@@1@15@@danf@17-8-2009 10712180@unknown@formal@none@1@S@There are a number of types of SVM such as linear, polynomial, sigmoid etc.@@@@1@14@@danf@17-8-2009 10712190@unknown@formal@none@1@S@==== Naïve Bayes====@@@@1@3@@danf@17-8-2009 10712200@unknown@formal@none@1@S@[[Naive Bayes classifier|Naïve Bayes]] based on Bayes conditional probability rule is used for performing classification tasks.@@@@1@16@@danf@17-8-2009 10712210@unknown@formal@none@1@S@Naïve Bayes assumes the predictors are statistically independent which makes it an effective classification tool that is easy to interpret.@@@@1@20@@danf@17-8-2009 10712220@unknown@formal@none@1@S@It is best employed when faced with the problem of ‘curse of dimensionality’ i.e. when the number of predictors is very high.@@@@1@22@@danf@17-8-2009 10712230@unknown@formal@none@1@S@==== k-nearest neighbours====@@@@1@3@@danf@17-8-2009 10712240@unknown@formal@none@1@S@The [[K-nearest neighbor algorithm|nearest neighbour algorithm]] (KNN) belongs to the class of pattern recognition statistical methods.@@@@1@16@@danf@17-8-2009 10712250@unknown@formal@none@1@S@The method does not impose a priori any assumptions about the distribution from which the modeling sample is drawn.@@@@1@19@@danf@17-8-2009 10712260@unknown@formal@none@1@S@It involves a training set with both positive and negative values.@@@@1@11@@danf@17-8-2009 10712270@unknown@formal@none@1@S@A new sample is classified by calculating the distance to the nearest neighbouring training case.@@@@1@15@@danf@17-8-2009 10712280@unknown@formal@none@1@S@The sign of that point will determine the classification of the sample.@@@@1@12@@danf@17-8-2009 10712290@unknown@formal@none@1@S@In the k-nearest neighbour classifier, the k nearest points are considered and the sign of the majority is used to classify the sample.@@@@1@23@@danf@17-8-2009 10712300@unknown@formal@none@1@S@The performance of the kNN algorithm is influenced by three main factors: (1) the distance measure used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample.@@@@1@47@@danf@17-8-2009 10712310@unknown@formal@none@1@S@It can be proved that, unlike other methods, this method is universally asymptotically convergent, i.e.: as the size of the training set increases, if the observations are iid, regardless of the distribution from which the sample is drawn, the predicted class will converge to the class assignment that minimizes misclassification error.@@@@1@51@@danf@17-8-2009 10712320@unknown@formal@none@1@S@See Devroy et alt.@@@@1@4@@danf@17-8-2009 10712330@unknown@formal@none@1@S@==Popular tools==@@@@1@2@@danf@17-8-2009 10712340@unknown@formal@none@1@S@There are numerous tools available in the marketplace which help with the execution of predictive analytics.@@@@1@16@@danf@17-8-2009 10712350@unknown@formal@none@1@S@These range from those which need very little user sophistication to those that are designed for the expert practitioner.@@@@1@19@@danf@17-8-2009 10712360@unknown@formal@none@1@S@The difference between these tools is often in the level of customization and heavy data lifting allowed.@@@@1@17@@danf@17-8-2009 10712370@unknown@formal@none@1@S@For traditional statistical modeling some of the popular tools are [[DAP (software)|DAP]]/[[SAS Institute|SAS]], S-Plus, [[PSPP]]/[[SPSS]] and Stata.@@@@1@17@@danf@17-8-2009 10712380@unknown@formal@none@1@S@For machine learning/data mining type of applications, KnowledgeSEEKER, KnowledgeSTUDIO, Enterprise Miner, GeneXproTools, [[Viscovery]], Clementine, [[KXEN Inc.|KXEN Analytic Framework]], [[InforSense]] and Excel Miner are some of the popularly used options.@@@@1@29@@danf@17-8-2009 10712390@unknown@formal@none@1@S@Classification Tree analysis can be performed using CART software.@@@@1@9@@danf@17-8-2009 10712400@unknown@formal@none@1@S@SOMine is a predictive analytics tool based on [[self-organizing map]]s (SOMs) available from [[Viscovery Software]].@@@@1@15@@danf@17-8-2009 10712410@unknown@formal@none@1@S@[[R (programming_language)|R]] is a very powerful tool that can be used to perform almost any kind of statistical analysis, and is freely downloadable.@@@@1@23@@danf@17-8-2009 10712420@unknown@formal@none@1@S@[[WEKA]] is a freely available [[open source|open-source]] collection of [[machine learning]] methods for pattern classification, regression, clustering, and some types of meta-learning, which can be used for predictive analytics.@@@@1@29@@danf@17-8-2009 10712430@unknown@formal@none@1@S@[[RapidMiner]] is another freely available integrated [[open source|open-source]] software environment for predictive analytics, [[data mining]], and [[machine learning]] fully integrating WEKA and providing an even larger number of methods for predictive analytics.@@@@1@32@@danf@17-8-2009 10712440@unknown@formal@none@1@S@Recently, in an attempt to provide a standard language for expressing predictive models, the [[Predictive Model Markup Language]] (PMML) has been proposed.@@@@1@22@@danf@17-8-2009 10712450@unknown@formal@none@1@S@Such an XML-based language provides a way for the different tools to define predictive models and to share these between PMML compliant applications.@@@@1@23@@danf@17-8-2009 10712460@unknown@formal@none@1@S@Several tools already produce or consume PMML documents, these include [[ADAPA]], [[IBM DB2]] Warehouse, CART, SAS Enterprise Miner, and [[SPSS]].@@@@1@20@@danf@17-8-2009 10712470@unknown@formal@none@1@S@Predictive analytics has also found its way into the IT lexicon, most notably in the area of IT Automation.@@@@1@19@@danf@17-8-2009 10712480@unknown@formal@none@1@S@Vendors such as [[Stratavia]] and their [[Data Palette]] product offer predictive analytics as part of their automation platform, predicting how resources will behave in the future and automate the environment accordingly.@@@@1@31@@danf@17-8-2009 10712490@unknown@formal@none@1@S@The widespread use of predictive analytics in industry has led to the proliferation of numerous productized solutions firms.@@@@1@18@@danf@17-8-2009 10712500@unknown@formal@none@1@S@Some of them are highly specialized (focusing, for example, on fraud detection, automatic saleslead generation or response modeling) in a specific domain ([[Fair Isaac]] for credit card scores) or industry verticals (MarketRx in Pharmaceutical).@@@@1@34@@danf@17-8-2009 10712510@unknown@formal@none@1@S@Others provide predictive analytics services in support of a wide range of business problems across industry verticals ([[Fifth C]]).@@@@1@19@@danf@17-8-2009 10712520@unknown@formal@none@1@S@Predictive Analytics competitions are also fairly common and often pit academics and Industry practitioners (see for example, KDD CUP).@@@@1@19@@danf@17-8-2009 10712530@unknown@formal@none@1@S@==Conclusion==@@@@1@1@@danf@17-8-2009 10712540@unknown@formal@none@1@S@Predictive analytics adds great value to a businesses decision making capabilities by allowing it to formulate smart policies on the basis of predictions of future outcomes.@@@@1@26@@danf@17-8-2009 10712550@unknown@formal@none@1@S@A broad range of tools and techniques are available for this type of analysis and their selection is determined by the analytical maturity of the firm as well as the specific requirements of the problem being solved.@@@@1@37@@danf@17-8-2009 10712560@unknown@formal@none@1@S@==Education==@@@@1@1@@danf@17-8-2009 10712570@unknown@formal@none@1@S@Predictive analytics is taught at the following institutions:@@@@1@8@@danf@17-8-2009 10712580@unknown@formal@none@1@S@* Ghent University, Belgium: [http://www.mma.UGent.be Master of Marketing Analysis], an 8-month advanced master degree taught in English with strong emphasis on applications of predictive analytics in Analytical CRM.@@@@1@28@@danf@17-8-2009 10720010@unknown@formal@none@1@S@
RapidMiner
@@@@1@1@@danf@17-8-2009 10720020@unknown@formal@none@1@S@'''RapidMiner''' (formerly YALE (Yet Another Learning Environment)) is an environment for [[machine learning]] and [[data mining]] experiments.@@@@1@17@@danf@17-8-2009 10720030@unknown@formal@none@1@S@It allows experiments to be made up of a large number of arbitrarily nestable operators, described in [[XML]] files which can easily be created with RapidMiner's [[graphical user interface]].@@@@1@29@@danf@17-8-2009 10720040@unknown@formal@none@1@S@Applications of RapidMiner cover both research and real-world data mining tasks.@@@@1@11@@danf@17-8-2009 10720050@unknown@formal@none@1@S@The initial version has been developed by the Artificial Intelligence Unit of [[Dortmund University of Technology|University of Dortmund]] since [[2001]].@@@@1@20@@danf@17-8-2009 10720060@unknown@formal@none@1@S@It is distributed under a [[GNU]] license, and has been hosted by [[SourceForge]] since [[2004]].@@@@1@15@@danf@17-8-2009 10720070@unknown@formal@none@1@S@RapidMiner provides more than 400 operators for all main machine learning procedures, including input and output, and data preprocessing and visualization.@@@@1@21@@danf@17-8-2009 10720080@unknown@formal@none@1@S@It is written in the [[Java (programming language)|Java programming language]] and therefore can work on all popular operating systems.@@@@1@19@@danf@17-8-2009 10720090@unknown@formal@none@1@S@It also integrates all learning schemes and attribute evaluators of the [[Weka (machine learning)|Weka]] learning environment.@@@@1@16@@danf@17-8-2009 10720100@unknown@formal@none@1@S@== Properties ==@@@@1@3@@danf@17-8-2009 10720110@unknown@formal@none@1@S@Some properties of RapidMiner are:@@@@1@5@@danf@17-8-2009 10720120@unknown@formal@none@1@S@* written in Java@@@@1@4@@danf@17-8-2009 10720130@unknown@formal@none@1@S@* [[knowledge discovery]] processes are modeled as operator trees@@@@1@9@@danf@17-8-2009 10720140@unknown@formal@none@1@S@* internal XML representation ensures standardized interchange format of data mining experiments@@@@1@12@@danf@17-8-2009 10720150@unknown@formal@none@1@S@* scripting language allows for automatic large-scale experiments@@@@1@8@@danf@17-8-2009 10720160@unknown@formal@none@1@S@* multi-layered data view concept ensures efficient and transparent data handling@@@@1@11@@danf@17-8-2009 10720170@unknown@formal@none@1@S@* [[graphical user interface]], [[command line]] mode ([[Batch file|batch mode]]), and [[Java API]] for using RapidMiner from your own programs@@@@1@20@@danf@17-8-2009 10720180@unknown@formal@none@1@S@* [[plugin]] and [[Extension (computing)|extension]] mechanisms, several plugins already exist@@@@1@10@@danf@17-8-2009 10720190@unknown@formal@none@1@S@* [[plotting]] facility offering a large set of high-dimensional visualization schemes for data and models@@@@1@15@@danf@17-8-2009 10720200@unknown@formal@none@1@S@* applications include [[text mining]], multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.@@@@1@24@@danf@17-8-2009 10730010@unknown@formal@none@1@S@
Russian language
@@@@1@2@@danf@17-8-2009 10730020@unknown@formal@none@1@S@'''Russian''' ([[:Media:Ru-russkiy jizyk.ogg|]] , [[Romanization of Russian|transliteration]]: , {{IPA-ru|ˈruskʲɪj jɪˈzɨk}}) is the most geographically widespread language of [[Eurasia]], the most widely spoken of the [[Slavic languages]], and the largest [[native language]] in [[Europe]].@@@@1@38@@danf@17-8-2009 10730030@unknown@formal@none@1@S@Russian belongs to the family of [[Indo-European languages]] and is one of three (or, according to some authorities , four) living members of the [[East Slavic languages]], the others being [[Belarusian language|Belarusian]] and [[Ukrainian language|Ukrainian]] (and possibly [[Rusyn language|Rusyn]], often considered a dialect of Ukrainian).@@@@1@45@@danf@17-8-2009 10730040@unknown@formal@none@1@S@It is also spoken by the countries of the [[Russophone]].@@@@1@10@@danf@17-8-2009 10730050@unknown@formal@none@1@S@Written examples of Old East Slavonic are attested from the 10th century onwards.@@@@1@13@@danf@17-8-2009 10730060@unknown@formal@none@1@S@Today Russian is widely used outside [[Russia]].@@@@1@7@@danf@17-8-2009 10730070@unknown@formal@none@1@S@It is applied as a means of coding and storage of universal knowledge — 60–70% of all world information is published in English and Russian languages.@@@@1@26@@danf@17-8-2009 10730080@unknown@formal@none@1@S@Over a quarter of the world's scientific literature is published in Russian.@@@@1@12@@danf@17-8-2009 10730090@unknown@formal@none@1@S@Russian is also a necessary accessory of world communications systems (broadcasts, air- and space communication, etc).@@@@1@16@@danf@17-8-2009 10730100@unknown@formal@none@1@S@Due to the status of the [[Soviet Union]] as a [[superpower]], Russian had great political importance in the 20th century.@@@@1@20@@danf@17-8-2009 10730110@unknown@formal@none@1@S@Hence, the language is one of the [[United Nations#Languages|official languages]] of the [[United Nations]].@@@@1@14@@danf@17-8-2009 10730120@unknown@formal@none@1@S@Russian distinguishes between [[consonant]] [[phoneme]]s with [[palatalization|palatal]] [[secondary articulation]] and those without, the so-called ''soft'' and ''hard'' sounds.@@@@1@18@@danf@17-8-2009 10730130@unknown@formal@none@1@S@This distinction is found between pairs of almost all consonants and is one of the most distinguishing features of the language.@@@@1@21@@danf@17-8-2009 10730140@unknown@formal@none@1@S@Another important aspect is the [[vowel reduction|reduction]] of [[stress (linguistics)|unstressed]] [[vowel]]s, which is somewhat similar to [[Unstressed and reduced vowels in English|that of English]].@@@@1@24@@danf@17-8-2009 10730150@unknown@formal@none@1@S@Stress, which is unpredictable, is not normally indicated orthographically.@@@@1@9@@danf@17-8-2009 10730160@unknown@formal@none@1@S@According to the Institute of Russian Language of the Russian Academy of Sciences, an optional [[acute accent]] () may, and sometimes should, be used to mark stress.@@@@1@27@@danf@17-8-2009 10730170@unknown@formal@none@1@S@For example, it is used to distinguish between otherwise identical words, especially when context doesn't make it obvious: ''замо́к/за́мок'' (lock/castle), ''сто́ящий/стоя́щий'' (worthwhile/standing), ''чудно́/чу́дно'' (this is odd/this is marvellous), ''молоде́ц/мо́лодец'' (attaboy/fine young man), ''узна́ю/узнаю́'' (I shall learn it/I am learning it), ''отреза́ть/отре́зать'' (infinitive for "cut"/perfective for "cut"); to indicate the proper pronouncation of uncommon words, especially personal and family names (''афе́ра, гу́ру, Гарси́а, Оле́ша, Фе́рми''), and to express the stressed word in the sentence (''Ты́ съел печенье?/Ты съе́л печенье?/Ты съел пече́нье?'' - Was it you who eat the cookie?/Did you eat the cookie?/Was the cookie your meal?).@@@@1@96@@danf@17-8-2009 10730180@unknown@formal@none@1@S@Acute accents are mandatory in lexical dictionaries and books intended to be used either by children or foreign readers.@@@@1@19@@danf@17-8-2009 10730190@unknown@formal@none@1@S@==Classification==@@@@1@1@@danf@17-8-2009 10730200@unknown@formal@none@1@S@Russian is a [[Slavic languages|Slavic language]] in the [[Indo-European Languages|Indo-European family]].@@@@1@11@@danf@17-8-2009 10730210@unknown@formal@none@1@S@From the point of view of the [[spoken language]], its closest relatives are [[Ukrainian language|Ukrainian]] and [[Belarusian language|Belarusian]], the other two national languages in the [[East Slavic languages|East Slavic]] group.@@@@1@30@@danf@17-8-2009 10730220@unknown@formal@none@1@S@In many places in eastern [[Ukraine]] and [[Belarus]], these languages are spoken interchangeably, and in certain areas traditional bilingualism resulted in language mixture, e.g. [[Surzhyk]] in eastern Ukraine and [[Trasianka]] in Belarus.@@@@1@32@@danf@17-8-2009 10730240@unknown@formal@none@1@S@An East Slavic [[Old Novgorod dialect]], although vanished during the fifteenth or sixteenth century, is sometimes considered to have played a significant role in formation of the modern Russian language.@@@@1@30@@danf@17-8-2009 10730250@unknown@formal@none@1@S@The vocabulary (mainly abstract and literary words), principles of word formation, and, to some extent, inflections and literary style of Russian have been also influenced by [[Church Slavonic language|Church Slavonic]], a developed and partly adopted form of the [[South Slavic languages|South Slavic]] [[Old Church Slavonic]] language used by the [[Russian Orthodox Church]].@@@@1@52@@danf@17-8-2009 10730260@unknown@formal@none@1@S@However, the East Slavic forms have tended to be used exclusively in the various dialects that are experiencing a rapid decline.@@@@1@21@@danf@17-8-2009 10730270@unknown@formal@none@1@S@In some cases, both the [[East Slavic languages|East Slavic]] and the [[Church Slavonic]] forms are in use, with slightly different meanings.@@@@1@21@@danf@17-8-2009 10730280@unknown@formal@none@1@S@''For details, see [[Russian phonology]] and [[History of the Russian language]].''@@@@1@11@@danf@17-8-2009 10730290@unknown@formal@none@1@S@Russian phonology and syntax (especially in northern dialects) have also been influenced to some extent by the numerous Finnic languages of the [[Finno-Ugric languages|Finno-Ugric subfamily]]: [[Merya language|Merya]], [[Moksha language|Moksha]], [[Muromian language|Muromian]], the language of the [[Meshchera]], [[Veps language|Veps]], et cetera.@@@@1@40@@danf@17-8-2009 10730300@unknown@formal@none@1@S@These languages, some of them now extinct, used to be spoken in the center and in the north of what is now the European part of Russia.@@@@1@27@@danf@17-8-2009 10730310@unknown@formal@none@1@S@They came in contact with Eastern Slavic as far back as the early Middle Ages and eventually served as substratum for the modern Russian language.@@@@1@25@@danf@17-8-2009 10730320@unknown@formal@none@1@S@The Russian dialects spoken north, north-east and north-west of [[Moscow]] have a considerable number of words of Finno-Ugric origin.@@@@1@19@@danf@17-8-2009 10730330@unknown@formal@none@1@S@Over the course of centuries, the vocabulary and literary style of Russian have also been influenced by Turkic/Caucasian/Central Asian languages, as well as Western/Central European languages such as [[Polish language|Polish]], [[Latin]], [[Dutch language|Dutch]], [[German language|German]], [[French language|French]], and [[English language|English]].@@@@1@40@@danf@17-8-2009 10730340@unknown@formal@none@1@S@According to the [[Defense Language Institute]] in [[Monterey, California]], Russian is classified as a level III language in terms of learning difficulty for native English speakers, requiring approximately 780 hours of immersion instruction to achieve intermediate fluency.@@@@1@37@@danf@17-8-2009 10730350@unknown@formal@none@1@S@It is also regarded by the [[United States Intelligence Community]] as a "hard target" language, due to both its difficulty to master for English speakers as well as due to its critical role in American world policy.@@@@1@37@@danf@17-8-2009 10730360@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10730370@unknown@formal@none@1@S@Russian is primarily spoken in [[Russia]] and, to a lesser extent, the other countries that were once constituent republics of the [[Soviet Union|USSR]].@@@@1@23@@danf@17-8-2009 10730380@unknown@formal@none@1@S@Until [[1917]], it was the sole official language of the [[Russian Empire]].@@@@1@12@@danf@17-8-2009 10730390@unknown@formal@none@1@S@During the Soviet period, the policy toward the languages of the various other ethnic groups fluctuated in practice.@@@@1@18@@danf@17-8-2009 10730400@unknown@formal@none@1@S@Though each of the constituent republics had its own official language, the unifying role and superior status was reserved for Russian.@@@@1@21@@danf@17-8-2009 10730410@unknown@formal@none@1@S@Following the break-up of [[1991]], several of the newly independent states have encouraged their native languages, which has partly reversed the privileged status of Russian, though its role as the language of post-Soviet national intercourse throughout the region has continued.@@@@1@40@@danf@17-8-2009 10730420@unknown@formal@none@1@S@In [[Latvia]], notably, its official recognition and legality in the classroom have been a topic of considerable debate in a country where more than one-third of the population is Russian-speaking, consisting mostly of post-[[World War II]] immigrants from Russia and other parts of the former [[USSR]] (Belarus, Ukraine).@@@@1@48@@danf@17-8-2009 10730430@unknown@formal@none@1@S@Similarly, in [[Estonia]], the Soviet-era immigrants and their Russian-speaking descendants constitute 25,6% of the country's current population and 58,6% of the native Estonian population is also able to speak Russian.@@@@1@30@@danf@17-8-2009 10730440@unknown@formal@none@1@S@In all, 67,8% of Estonia's population can speak Russian.@@@@1@9@@danf@17-8-2009 10730450@unknown@formal@none@1@S@In [[Kazakhstan]] and [[Kyrgyzstan]], Russian remains a co-official language with [[Kazakh language|Kazakh]] and [[Kyrgyz language|Kyrgyz]] respectively.@@@@1@16@@danf@17-8-2009 10730460@unknown@formal@none@1@S@Large Russian-speaking communities still exist in northern Kazakhstan, and ethnic Russians comprise 25.6 % of Kazakhstan's population.@@@@1@17@@danf@17-8-2009 10730470@unknown@formal@none@1@S@A much smaller Russian-speaking minority in [[Lithuania]] has represented less than 1/10 of the country's overall population.@@@@1@17@@danf@17-8-2009 10730480@unknown@formal@none@1@S@Nevertheless more than half of the population of the [[Baltic states]] are able to hold a conversation in Russian and almost all have at least some familiarity with the most basic spoken and written phrases.@@@@1@35@@danf@17-8-2009 10730490@unknown@formal@none@1@S@The Russian control of [[Finland]] in 1809–1918, however, has left few Russian speakers in Finland.@@@@1@15@@danf@17-8-2009 10730500@unknown@formal@none@1@S@There are 33,400 Russian speakers in Finland, amounting to 0.6% of the population.@@@@1@13@@danf@17-8-2009 10730510@unknown@formal@none@1@S@5000 (0.1%) of them are late 19th century and 20th century immigrants, and the rest are recent immigrants, who have arrived in the 90's and later.@@@@1@26@@danf@17-8-2009 10730520@unknown@formal@none@1@S@In the twentieth century, Russian was widely taught in the schools of the members of the old [[Warsaw Pact]] and in other [[Communist state|countries]] that used to be allies of the USSR.@@@@1@32@@danf@17-8-2009 10730530@unknown@formal@none@1@S@In particular, these countries include [[Poland]], [[Bulgaria]], the [[Czech Republic]], [[Slovakia]], [[Hungary]], [[Romania]], [[Albania]] and [[Cuba]].@@@@1@16@@danf@17-8-2009 10730540@unknown@formal@none@1@S@However, younger generations are usually not fluent in it, because Russian is no longer mandatory in the school system.@@@@1@19@@danf@17-8-2009 10730550@unknown@formal@none@1@S@It is currently the most widely-taught foreign language in [[Mongolia]].@@@@1@10@@danf@17-8-2009 10730560@unknown@formal@none@1@S@Russian is also spoken in [[Israel]] by at least 750,000 ethnic [[Jew]]ish immigrants from the former [[Soviet Union]] (1999 census).@@@@1@20@@danf@17-8-2009 10730570@unknown@formal@none@1@S@The Israeli [[Mass media|press]] and [[website]]s regularly publish material in Russian.@@@@1@11@@danf@17-8-2009 10730580@unknown@formal@none@1@S@Sizable Russian-speaking communities also exist in [[North America]], especially in large urban centers of the [[United States|U.S.]] and [[Canada]] such as [[New York City]], [[Philadelphia]], [[Boston, Massachusetts|Boston]], [[Los Angeles, California|Los Angeles]], [[San Francisco]], [[Seattle]], [[Toronto]], [[Baltimore]], [[Miami, Florida|Miami]], [[Chicago]], [[Denver]], and the [[Cleveland, Ohio|Cleveland]] suburb of [[Richmond Heights, Ohio|Richmond Heights]].@@@@1@50@@danf@17-8-2009 10730590@unknown@formal@none@1@S@In the former two, Russian-speaking groups total over half a million.@@@@1@11@@danf@17-8-2009 10730600@unknown@formal@none@1@S@In a number of locations they issue their own newspapers, and live in their self-sufficient neighborhoods (especially the generation of immigrants who started arriving in the early sixties).@@@@1@28@@danf@17-8-2009 10730610@unknown@formal@none@1@S@Only about a quarter of them are ethnic Russians, however.@@@@1@10@@danf@17-8-2009 10730620@unknown@formal@none@1@S@Before the [[dissolution of the Soviet Union]], the overwhelming majority of [[Russophone]]s in North America were Russian-speaking [[Jews]].@@@@1@18@@danf@17-8-2009 10730630@unknown@formal@none@1@S@Afterwards the influx from the countries of the former [[Soviet Union]] changed the statistics somewhat.@@@@1@15@@danf@17-8-2009 10730640@unknown@formal@none@1@S@According to the [[United States 2000 Census]], Russian is the primary language spoken in the homes of over 700,000 individuals living in the United States.@@@@1@25@@danf@17-8-2009 10730650@unknown@formal@none@1@S@Significant Russian-speaking groups also exist in [[Western Europe]].@@@@1@8@@danf@17-8-2009 10730660@unknown@formal@none@1@S@These have been fed by several waves of immigrants since the beginning of the twentieth century, each with its own flavor of language.@@@@1@23@@danf@17-8-2009 10730670@unknown@formal@none@1@S@[[Germany]], the [[United Kingdom]], [[Spain]], [[France]], [[Italy]], [[Belgium]], [[Greece]], [[Brazil]], [[Norway]], [[Austria]], and [[Turkey]] have significant Russian-speaking communities totaling 3 million people.@@@@1@22@@danf@17-8-2009 10730680@unknown@formal@none@1@S@Two thirds of them are actually Russian-speaking descendants of [[German people|Germans]], [[Greeks]], [[Jews]], [[Armenians]], or [[Ukrainians]] who either repatriated after the [[USSR]] collapsed or are just looking for temporary employment.@@@@1@30@@danf@17-8-2009 10730690@unknown@formal@none@1@S@Recent estimates of the total number of speakers of Russian:@@@@1@10@@danf@17-8-2009 10730700@unknown@formal@none@1@S@===Official status===@@@@1@2@@danf@17-8-2009 10730710@unknown@formal@none@1@S@Russian is the official language of [[Russia]].@@@@1@7@@danf@17-8-2009 10730720@unknown@formal@none@1@S@It is also an official language of [[Belarus]], [[Kazakhstan]], [[Kyrgyzstan]], an unofficial but widely spoken language in [[Ukraine]] and the de facto official language of the [[List of unrecognized countries|unrecognized]] of [[Transnistria]], [[South Ossetia]] and [[Abkhazia]].@@@@1@36@@danf@17-8-2009 10730730@unknown@formal@none@1@S@Russian is one of the [[United Nations#Languages|six official languages]] of the [[United Nations]].@@@@1@13@@danf@17-8-2009 10730740@unknown@formal@none@1@S@Education in Russian is still a popular choice for both Russian as a second language (RSL) and native speakers in Russia as well as many of the former Soviet republics.@@@@1@30@@danf@17-8-2009 10730750@unknown@formal@none@1@S@97% of the public school students of Russia, 75% in Belarus, 41% in Kazakhstan, 25% in [[Ukraine]], 23% in Kyrgyzstan, 21% in [[Moldova]], 7% in [[Azerbaijan]], 5% in [[Georgia (country)|Georgia]] and 2% in [[Armenia]] and [[Tajikistan]] receive their education only or mostly in Russian.@@@@1@44@@danf@17-8-2009 10730760@unknown@formal@none@1@S@Although the corresponding percentage of ethnic Russians is 78% in [[Russia]], 10% in [[Belarus]], 26% in [[Kazakhstan]], 17% in [[Ukraine]], 9% in [[Kyrgyzstan]], 6% in [[Republic of Moldova|Moldova]], 2% in [[Azerbaijan]], 1.5% in [[Georgia (country)|Georgia]] and less than 1% in both [[Armenia]] and [[Tajikistan]].@@@@1@44@@danf@17-8-2009 10730770@unknown@formal@none@1@S@Russian-language schooling is also available in Latvia, Estonia and Lithuania, but due to education reforms, a number of subjects taught in Russian are reduced at the high school level.@@@@1@29@@danf@17-8-2009 10730780@unknown@formal@none@1@S@The language has a co-official status alongside [[Moldovan language|Moldovan]] in the autonomies of [[Gagauzia]] and [[Transnistria]] in [[Moldova]], and in seven [[Romania]]n [[Commune in Romania|communes]] in [[Tulcea County|Tulcea]] and [[Constanţa County|Constanţa]] counties.@@@@1@32@@danf@17-8-2009 10730790@unknown@formal@none@1@S@In these localities, Russian-speaking [[Lipovans]], who are a recognized ethnic minority, make up more than 20% of the population.@@@@1@19@@danf@17-8-2009 10730800@unknown@formal@none@1@S@Thus, according to Romania's minority rights law, education, signage, and access to public administration and the justice system are provided in Russian alongside Romanian.@@@@1@24@@danf@17-8-2009 10730810@unknown@formal@none@1@S@In the [[Crimea|Autonomous Republic of Crimea]] in Ukraine, Russian is an officially recognized language alongside with [[Crimean Tatar language|Crimean Tatar]], but in reality, is the only language used by the government, thus being a ''[[de facto]]'' official language.@@@@1@38@@danf@17-8-2009 10730820@unknown@formal@none@1@S@===Dialects===@@@@1@1@@danf@17-8-2009 10730830@unknown@formal@none@1@S@Despite leveling after 1900, especially in matters of vocabulary, a number of dialects exist in Russia.@@@@1@16@@danf@17-8-2009 10730840@unknown@formal@none@1@S@Some linguists divide the dialects of the Russian language into two primary regional groupings, "Northern" and "Southern", with [[Moscow]] lying on the zone of transition between the two.@@@@1@28@@danf@17-8-2009 10730850@unknown@formal@none@1@S@Others divide the language into three groupings, Northern, Central and Southern, with Moscow lying in the Central region.@@@@1@18@@danf@17-8-2009 10730860@unknown@formal@none@1@S@[[Dialectology]] within Russia recognizes dozens of smaller-scale variants.@@@@1@8@@danf@17-8-2009 10730870@unknown@formal@none@1@S@The dialects often show distinct and non-standard features of pronunciation and intonation, vocabulary, and grammar.@@@@1@15@@danf@17-8-2009 10730880@unknown@formal@none@1@S@Some of these are relics of ancient usage now completely discarded by the standard language.@@@@1@15@@danf@17-8-2009 10730890@unknown@formal@none@1@S@The [[northern Russian dialects]] and those spoken along the [[Volga River]] typically pronounce unstressed {{IPA|/o/}} clearly (the phenomenon called [[vowel reduction in Russian#Back vowels|okanye]]/оканье).@@@@1@24@@danf@17-8-2009 10730900@unknown@formal@none@1@S@East of Moscow, particularly in [[Ryazan Region]], unstressed {{IPA|/e/}} and {{IPA|/a/}} following [[palatalization|palatalized]] consonants and preceding a stressed syllable are not reduced to {{IPA|[ɪ]}} (like in the Moscow dialect), being instead pronounced as {{IPA|/a/}} in such positions (e.g. несл'''и''' is pronounced as {{IPA|[nʲasˈlʲi]}}, not as {{IPA|[nʲɪsˈlʲi]}}) - this is called [[yakanye]]/ яканье; many southern dialects have a palatalized final {{IPA|/tʲ/}} in 3rd person forms of verbs (this is unpalatalized in the standard dialect) and a fricative {{IPA|[ɣ]}} where the standard dialect has {{IPA|[g]}}.@@@@1@83@@danf@17-8-2009 10730910@unknown@formal@none@1@S@However, in certain areas south of Moscow, e.g. in and around [[Tula, Russia|Tula]], {{IPA|/g/}} is pronounced as in the Moscow and northern dialects unless it precedes a voiceless plosive or a pause.@@@@1@32@@danf@17-8-2009 10730920@unknown@formal@none@1@S@In this position {{IPA|/g/}} is lenited and devoiced to the fricative {{IPA|[x]}}, e.g. друг {{IPA|[drux]}} (in Moscow's dialect, only Бог {{IPA|[box]}}, лёгкий {{IPA|[lʲɵxʲkʲɪj]}}, мягкий {{IPA|[ˈmʲæxʲkʲɪj]}} and some derivatives follow this rule).@@@@1@31@@danf@17-8-2009 10730930@unknown@formal@none@1@S@Some of these features (e.g. a [[debuccalization|debuccalized]] or [[lenition|lenited]] {{IPA|/g/}} and palatalized final {{IPA|/tʲ/}} in 3rd person forms of verbs) are also present in modern [[Ukrainian language|Ukrainian]], indicating either a linguistic continuum or strong influence one way or the other.@@@@1@40@@danf@17-8-2009 10730940@unknown@formal@none@1@S@The city of [[Veliky Novgorod]] has historically displayed a feature called chokanye/tsokanye (чоканье/цоканье), where {{IPA|/ʨ/}} and {{IPA|/ʦ/}} were confused (this is thought to be due to influence from [[Finnish language|Finnish]], which doesn't distinguish these sounds).@@@@1@35@@danf@17-8-2009 10730950@unknown@formal@none@1@S@So, '''ц'''апля ("heron") has been recorded as 'чапля'.@@@@1@8@@danf@17-8-2009 10730960@unknown@formal@none@1@S@Also, the second palatalization of [[Velar consonant|velar]]s did not occur there, so the so-called '''ě²''' (from the Proto-Slavonic diphthong *ai) did not cause {{IPA|/k, g, x/}} to shift to {{IPA|/ʦ, ʣ, s/}}; therefore where [[Standard Russian]] has '''ц'''епь ("chain"), the form '''к'''епь {{IPA|[kʲepʲ]}} is attested in earlier texts.@@@@1@48@@danf@17-8-2009 10730970@unknown@formal@none@1@S@Among the first to study Russian dialects was [[Mikhail Lomonosov|Lomonosov]] in the eighteenth century.@@@@1@14@@danf@17-8-2009 10730980@unknown@formal@none@1@S@In the nineteenth, [[Vladimir Dal]] compiled the first dictionary that included dialectal vocabulary.@@@@1@13@@danf@17-8-2009 10730990@unknown@formal@none@1@S@Detailed mapping of Russian dialects began at the turn of the twentieth century.@@@@1@13@@danf@17-8-2009 10731000@unknown@formal@none@1@S@In modern times, the monumental ''Dialectological Atlas of the Russian Language'' (''Диалектологический атлас русского языка'' {{IPA|[dʲɪɐˌlʲɛktəlɐˈgʲiʨɪskʲɪj ˈatləs ˈruskəvə jɪzɨˈka]}}), was published in 3 folio volumes 1986–1989, after four decades of preparatory work.@@@@1@32@@danf@17-8-2009 10731010@unknown@formal@none@1@S@The ''standard language'' is based on (but not identical to) the Moscow dialect.@@@@1@13@@danf@17-8-2009 10731020@unknown@formal@none@1@S@===Derived languages===@@@@1@2@@danf@17-8-2009 10731030@unknown@formal@none@1@S@* [[Balachka]] a dialect, spoken primarily by [[Cossacks]], in the regions of Don, [[Kuban]] and [[Terek]].@@@@1@16@@danf@17-8-2009 10731040@unknown@formal@none@1@S@* [[Fenya]], a criminal [[argot]] of ancient origin, with Russian grammar, but with distinct vocabulary.@@@@1@15@@danf@17-8-2009 10731050@unknown@formal@none@1@S@* [[Nadsat]], the fictional language spoken in '[[A Clockwork Orange]]' uses a lot of Russian words and Russian slang.@@@@1@19@@danf@17-8-2009 10731060@unknown@formal@none@1@S@* [[Surzhyk]] is a language with Russian and Ukrainian features, spoken in some areas of Ukraine@@@@1@16@@danf@17-8-2009 10731070@unknown@formal@none@1@S@* [[Trasianka]] is a language with Russian and Belarusian features used by a large portion of the rural population in [[Belarus]].@@@@1@21@@danf@17-8-2009 10731080@unknown@formal@none@1@S@* [[Quelia]], a pseudo pidgin of German and Russian.@@@@1@9@@danf@17-8-2009 10731090@unknown@formal@none@1@S@* [[Runglish]], Russian-English pidgin.@@@@1@4@@danf@17-8-2009 10731100@unknown@formal@none@1@S@This word is also used by English speakers to describe the way in which Russians attempt to speak English using Russian morphology and/or syntax.@@@@1@24@@danf@17-8-2009 10731110@unknown@formal@none@1@S@* [[Russenorsk language|Russenorsk]] is an extinct [[pidgin]] language with mostly Russian vocabulary and mostly [[Norwegian language|Norwegian]] grammar, used for communication between [[Russians]] and [[Norway|Norwegian]] traders in the Pomor trade in [[Finnmark]] and the [[Kola Peninsula]].@@@@1@35@@danf@17-8-2009 10731120@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10731130@unknown@formal@none@1@S@===Alphabet===@@@@1@1@@danf@17-8-2009 10731140@unknown@formal@none@1@S@Russian is written using a modified version of the [[Cyrillic alphabet|Cyrillic (кириллица)]] alphabet.@@@@1@13@@danf@17-8-2009 10731150@unknown@formal@none@1@S@The Russian alphabet consists of 33 letters.@@@@1@7@@danf@17-8-2009 10731160@unknown@formal@none@1@S@The following table gives their upper case forms, along with [[help:IPA|IPA]] values for each letter's typical sound:@@@@1@17@@danf@17-8-2009 10731170@unknown@formal@none@1@S@Older letters of the Russian alphabet include <>, which merged to <е> ({{IPA|/e/}}); <і> and <>, which both merged to <и>({{IPA|/i/}}); <>, which merged to <ф> ({{IPA|/f/}}); and <>, which merged to <я> ({{IPA|/ja/}} or {{IPA|/ʲa/}}).@@@@1@36@@danf@17-8-2009 10731180@unknown@formal@none@1@S@While these older letters have been abandoned at one time or another, they may be used in this and related articles.@@@@1@21@@danf@17-8-2009 10731190@unknown@formal@none@1@S@The [[yer]]s <ъ> and <ь> originally indicated the pronunciation of ''ultra-short'' or ''reduced'' {{IPA|/ŭ/}}, {{IPA|/ĭ/}}.@@@@1@15@@danf@17-8-2009 10731200@unknown@formal@none@1@S@The Russian alphabet has many systems of [[character encoding]].@@@@1@9@@danf@17-8-2009 10731210@unknown@formal@none@1@S@[[KOI8-R]] was designed by the government and was intended to serve as the standard encoding.@@@@1@15@@danf@17-8-2009 10731220@unknown@formal@none@1@S@This encoding is still used in UNIX-like operating systems.@@@@1@9@@danf@17-8-2009 10731230@unknown@formal@none@1@S@Nevertheless, the spread of [[MS-DOS]] and [[Microsoft Windows]] created chaos and ended by establishing different encodings as de-facto standards.@@@@1@19@@danf@17-8-2009 10731240@unknown@formal@none@1@S@For communication purposes, a number of conversion applications were developed.@@@@1@10@@danf@17-8-2009 10731245@unknown@formal@none@1@S@"[[iconv]]" is an example that is supported by most versions of [[Linux]], [[Macintosh]] and some other [[operating system]]s.@@@@1@18@@danf@17-8-2009 10731250@unknown@formal@none@1@S@Most implementations (especially old ones) of the character encoding for the Russian language are aimed at simultaneous use of English and Russian characters only and do not include support for any other language.@@@@1@33@@danf@17-8-2009 10731260@unknown@formal@none@1@S@Certain hopes for a unification of the character encoding for the Russian alphabet are related to the [[Unicode|Unicode standard]], specifically designed for peaceful coexistence of various languages, including even [[dead language]]s.@@@@1@31@@danf@17-8-2009 10731270@unknown@formal@none@1@S@[[Unicode]] also supports the letters of the [[Early Cyrillic alphabet]], which have many similarities with the [[Greek alphabet]].@@@@1@18@@danf@17-8-2009 10731280@unknown@formal@none@1@S@===Orthography===@@@@1@1@@danf@17-8-2009 10731290@unknown@formal@none@1@S@Russian spelling is reasonably phonemic in practice.@@@@1@7@@danf@17-8-2009 10731300@unknown@formal@none@1@S@It is in fact a balance among phonemics, morphology, etymology, and grammar; and, like that of most living languages, has its share of inconsistencies and controversial points.@@@@1@27@@danf@17-8-2009 10731310@unknown@formal@none@1@S@A number of rigid [[spelling rule]]s introduced between the 1880s and 1910s have been responsible for the latter whilst trying to eliminate the former.@@@@1@24@@danf@17-8-2009 10731320@unknown@formal@none@1@S@The current spelling follows the major reform of 1918, and the final codification of 1956.@@@@1@15@@danf@17-8-2009 10731330@unknown@formal@none@1@S@An update proposed in the late 1990s has met a hostile reception, and has not been formally adopted.@@@@1@18@@danf@17-8-2009 10731340@unknown@formal@none@1@S@The punctuation, originally based on Byzantine Greek, was in the seventeenth and eighteenth centuries reformulated on the French and German models.@@@@1@21@@danf@17-8-2009 10731350@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10731360@unknown@formal@none@1@S@The phonological system of Russian is inherited from [[Common Slavonic]], but underwent considerable modification in the early historical period, before being largely settled by about 1400.@@@@1@26@@danf@17-8-2009 10731370@unknown@formal@none@1@S@The language possesses five vowels, which are written with different letters depending on whether or not the preceding consonant is [[palatalization|palatalized]].@@@@1@21@@danf@17-8-2009 10731380@unknown@formal@none@1@S@The consonants typically come in plain vs. palatalized pairs, which are traditionally called ''hard'' and ''soft.''@@@@1@16@@danf@17-8-2009 10731390@unknown@formal@none@1@S@(The ''hard'' consonants are often [[velarization|velarized]], especially before back vowels, although in some dialects the velarization is limited to hard {{IPA|/l/}}).@@@@1@21@@danf@17-8-2009 10731400@unknown@formal@none@1@S@The standard language, based on the Moscow dialect, possesses heavy stress and moderate variation in pitch.@@@@1@16@@danf@17-8-2009 10731410@unknown@formal@none@1@S@Stressed vowels are somewhat lengthened, while unstressed vowels tend to be reduced to near-close vowels or an unclear [[schwa]].@@@@1@19@@danf@17-8-2009 10731420@unknown@formal@none@1@S@(See also: [[vowel reduction in Russian]].)@@@@1@6@@danf@17-8-2009 10731430@unknown@formal@none@1@S@The Russian [[syllable]] structure can be quite complex with both initial and final consonant clusters of up to 4 consecutive sounds.@@@@1@21@@danf@17-8-2009 10731440@unknown@formal@none@1@S@Using a formula with V standing for the nucleus (vowel) and C for each consonant the structure can be described as follows:@@@@1@22@@danf@17-8-2009 10731450@unknown@formal@none@1@S@(C)(C)(C)(C)V(C)(C)(C)(C)@@@@1@1@@danf@17-8-2009 10731460@unknown@formal@none@1@S@Clusters of four consonants are not very common, however, especially within a morpheme.@@@@1@13@@danf@17-8-2009 10731470@unknown@formal@none@1@S@===Consonants===@@@@1@1@@danf@17-8-2009 10731480@unknown@formal@none@1@S@Russian is notable for its distinction based on [[palatalization]] of most of the consonants.@@@@1@14@@danf@17-8-2009 10731490@unknown@formal@none@1@S@While {{IPA|/k/, /g/, /x/}} do have palatalized [[allophone]]s {{IPA|[kʲ, gʲ, xʲ]}}, only {{IPA|/kʲ/}} might be considered a phoneme, though it is marginal and generally not considered distinctive (the only native [[minimal pair]] which argues for {{IPA|/kʲ/}} to be a separate phoneme is "это ткёт"/"этот кот").@@@@1@45@@danf@17-8-2009 10731500@unknown@formal@none@1@S@Palatalization means that the center of the tongue is raised during and after the articulation of the consonant.@@@@1@18@@danf@17-8-2009 10731510@unknown@formal@none@1@S@In the case of {{IPA|/tʲ/ and /dʲ/}}, the tongue is raised enough to produce slight frication (affricate sounds).@@@@1@18@@danf@17-8-2009 10731520@unknown@formal@none@1@S@These sounds: {{IPA|/t, d, ʦ, s, z, n and rʲ/}} are [[dental consonant|dental]], that is pronounced with the tip of the tongue against the teeth rather than against the [[alveolar ridge]].@@@@1@31@@danf@17-8-2009 10731530@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10731540@unknown@formal@none@1@S@Russian has preserved an [[Indo-European languages|Indo-European]] [[Synthetic language|synthetic]]-[[inflection]]al structure, although considerable leveling has taken place.@@@@1@15@@danf@17-8-2009 10731550@unknown@formal@none@1@S@Russian grammar encompasses@@@@1@3@@danf@17-8-2009 10731560@unknown@formal@none@1@S@* a highly [[Synthetic language|synthetic]] '''morphology'''@@@@1@6@@danf@17-8-2009 10731570@unknown@formal@none@1@S@* a '''syntax''' that, for the literary language, is the conscious fusion of three elements:@@@@1@15@@danf@17-8-2009 10731580@unknown@formal@none@1@S@** a polished [[vernacular]] foundation;@@@@1@5@@danf@17-8-2009 10731590@unknown@formal@none@1@S@** a [[Church Slavonic language|Church Slavonic]] inheritance;@@@@1@7@@danf@17-8-2009 10731600@unknown@formal@none@1@S@** a [[Western Europe]]an style.@@@@1@5@@danf@17-8-2009 10731610@unknown@formal@none@1@S@The spoken language has been influenced by the literary one, but continues to preserve characteristic forms.@@@@1@16@@danf@17-8-2009 10731620@unknown@formal@none@1@S@The dialects show various non-standard grammatical features, some of which are archaisms or descendants of old forms since discarded by the literary language.@@@@1@23@@danf@17-8-2009 10731630@unknown@formal@none@1@S@==Vocabulary==@@@@1@1@@danf@17-8-2009 10731640@unknown@formal@none@1@S@See [[History of the Russian language]] for an account of the successive foreign influences on the Russian language.@@@@1@18@@danf@17-8-2009 10731650@unknown@formal@none@1@S@The total number of words in Russian is difficult to reckon because of the ability to agglutinate and create manifold compounds, diminutives, etc. (see [[Russian grammar#Word Formation|Word Formation]] under [[Russian grammar]]).@@@@1@31@@danf@17-8-2009 10731660@unknown@formal@none@1@S@The number of listed words or entries in some of the major dictionaries published during the last two centuries, and the total vocabulary of [[Pushkin]] (who is credited with greatly augmenting and codifying literary Russian), are as follows:@@@@1@38@@danf@17-8-2009 10731670@unknown@formal@none@1@S@(As a historical aside, [[Vladimir Ivanovich Dal|Dahl]] was, in the second half of the nineteenth century, still insisting that the proper spelling of the adjective '''русский''', which was at that time applied uniformly to all the Orthodox Eastern Slavic subjects of the Empire, as well as to its one official language, be spelled '''руский''' with one s, in accordance with ancient tradition and what he termed the "spirit of the language".@@@@1@71@@danf@17-8-2009 10731680@unknown@formal@none@1@S@He was contradicted by the philologist Grot, who distinctly heard the s lengthened or doubled).@@@@1@15@@danf@17-8-2009 10731690@unknown@formal@none@1@S@=== Proverbs and sayings ===@@@@1@5@@danf@17-8-2009 10731700@unknown@formal@none@1@S@The Russian language is replete with many hundreds of proverbs ('''пословица''' {{IPA|[pɐˈslo.vʲɪ.ʦə]}}) and sayings ('''поговоркa''' {{IPA|[pə.gɐˈvo.rkə]}}).@@@@1@16@@danf@17-8-2009 10731710@unknown@formal@none@1@S@These were already tabulated by the seventeenth century, and collected and studied in the nineteenth and twentieth, with the folk-tales being an especially fertile source.@@@@1@25@@danf@17-8-2009 10731720@unknown@formal@none@1@S@==History and examples==@@@@1@3@@danf@17-8-2009 10731730@unknown@formal@none@1@S@The history of Russian language may be divided into the following periods.@@@@1@12@@danf@17-8-2009 10731740@unknown@formal@none@1@S@* [[History of the Russian language#Kievan period and feudal breakup|Kievan period and feudal breakup]]@@@@1@14@@danf@17-8-2009 10731750@unknown@formal@none@1@S@* [[History of the Russian language#The Tatar yoke and the Grand Duchy of Lithuania|The Tatar yoke and the Grand Duchy of Lithuania]]@@@@1@22@@danf@17-8-2009 10731760@unknown@formal@none@1@S@* [[History of the Russian language#The Moscovite period (15th–17th centuries)|The Moscovite period (15th–17th centuries)]]@@@@1@14@@danf@17-8-2009 10731770@unknown@formal@none@1@S@* [[History of the Russian language#Empire (18th–19th centuries)|Empire (18th–19th centuries)]]@@@@1@10@@danf@17-8-2009 10731780@unknown@formal@none@1@S@* [[History of the Russian language#Soviet period and beyond (20th century)|Soviet period and beyond (20th century)]]@@@@1@16@@danf@17-8-2009 10731790@unknown@formal@none@1@S@Judging by the historical records, by approximately 1000 AD the predominant ethnic group over much of modern European [[Russia]], [[Ukraine]], and [[Belarus]] was the Eastern branch of the [[Slavic peoples|Slavs]], speaking a closely related group of dialects.@@@@1@37@@danf@17-8-2009 10731800@unknown@formal@none@1@S@The political unification of this region into [[Kievan Rus']] in about 880, from which modern Russia, Ukraine and Belarus trace their origins, established [[Old East Slavic]] as a literary and commercial language.@@@@1@32@@danf@17-8-2009 10731810@unknown@formal@none@1@S@It was soon followed by the adoption of [[Christianity]] in 988 and the introduction of the South Slavic [[Old Church Slavonic]] as the liturgical and official language.@@@@1@27@@danf@17-8-2009 10731820@unknown@formal@none@1@S@Borrowings and [[calque]]s from Byzantine [[Greek language|Greek]] began to enter the [[Old East Slavic]] and spoken dialects at this time, which in their turn modified the [[Old Church Slavonic]] as well.@@@@1@31@@danf@17-8-2009 10731830@unknown@formal@none@1@S@Dialectal differentiation accelerated after the breakup of [[Kievan Rus]] in approximately 1100.@@@@1@12@@danf@17-8-2009 10731840@unknown@formal@none@1@S@On the territories of modern [[Belarus]] and [[Ukraine]] emerged [[Ruthenian language|Ruthenian]] and in modern [[Russia]] [[History of the Russian language|medieval Russian]].@@@@1@21@@danf@17-8-2009 10731850@unknown@formal@none@1@S@They definitely became distinct in 13th century by the time of division of that land between the [[Grand Duchy of Lithuania]] on the west and independent Novgorod Feudal Republic plus small duchies which were vassals of the Tatars on the east.@@@@1@41@@danf@17-8-2009 10731860@unknown@formal@none@1@S@The official language in Moscow and Novgorod, and later, in the growing Moscow Rus’, was [[Church Slavonic]] which evolved from [[Old Church Slavonic]] and remained [[Diglossia|the literary language]] until the Petrine age, when its usage shrank drastically to biblical and liturgical texts.@@@@1@42@@danf@17-8-2009 10731870@unknown@formal@none@1@S@Russian developed under a strong influence of the Church Slavonic until the close of the seventeenth century; the influence reversed afterwards leading to corruption of liturgical texts.@@@@1@27@@danf@17-8-2009 10731880@unknown@formal@none@1@S@The political reforms of [[Peter I of Russia|Peter the Great]] were accompanied by a reform of the alphabet, and achieved their goal of secularization and Westernization.@@@@1@26@@danf@17-8-2009 10731890@unknown@formal@none@1@S@Blocks of specialized vocabulary were adopted from the languages of Western Europe.@@@@1@12@@danf@17-8-2009 10731900@unknown@formal@none@1@S@By 1800, a significant portion of the gentry spoke [[French language|French]], less often [[German language|German]], on an everyday basis.@@@@1@19@@danf@17-8-2009 10731910@unknown@formal@none@1@S@Many Russian novels of the 19th century, e.g. Lev Tolstoy’s "War and Peace", contain entire paragraphs and even pages in French with no translation given, with an assumption that educated readers won't need one.@@@@1@34@@danf@17-8-2009 10731920@unknown@formal@none@1@S@The modern literary language is usually considered to date from the time of [[Aleksandr Pushkin]] in the first third of the nineteenth century.@@@@1@23@@danf@17-8-2009 10731930@unknown@formal@none@1@S@Pushkin revolutionized Russian literature by rejecting archaic grammar and vocabulary (so called "высокий стиль" — "high style") in favor of grammar and vocabulary found in the spoken language of the time.@@@@1@31@@danf@17-8-2009 10731940@unknown@formal@none@1@S@Even modern readers of younger age may only experience slight difficulties understanding some words in Pushkin’s texts, since only few words used by Pushkin became archaic or changed meaning.@@@@1@29@@danf@17-8-2009 10731950@unknown@formal@none@1@S@On the other hand, many expressions used by Russian writers of the early 19th century, in particular Pushkin, [[Lermontov]], [[Gogol]], Griboiädov, became proverbs or sayings which can be frequently found even in the modern Russian colloquial speech.@@@@1@37@@danf@17-8-2009 10731960@unknown@formal@none@1@S@The political upheavals of the early twentieth century and the wholesale changes of political ideology gave written Russian its modern appearance after the spelling reform of 1918.@@@@1@27@@danf@17-8-2009 10731970@unknown@formal@none@1@S@Political circumstances and Soviet accomplishments in military, scientific, and technological matters (especially cosmonautics), gave Russian a world-wide prestige, especially during the middle third of the twentieth century.@@@@1@27@@danf@17-8-2009 10740010@unknown@formal@none@1@S@
Web search engine
@@@@1@3@@danf@17-8-2009 10740020@unknown@formal@none@1@S@A '''Web search engine''' is a [[search engine (computing)|search engine]] designed to search for information on the [[World Wide Web]].@@@@1@20@@danf@17-8-2009 10740030@unknown@formal@none@1@S@Information may consist of [[web page]]s, images and other types of files.@@@@1@12@@danf@17-8-2009 10740040@unknown@formal@none@1@S@Some search engines also mine data available in newsbooks, databases, or [[Web directory|open directories]].@@@@1@14@@danf@17-8-2009 10740050@unknown@formal@none@1@S@Unlike [[Web directories]], which are maintained by human editors, search engines operate algorithmically or are a mixture of [[algorithmic]] and human input.@@@@1@22@@danf@17-8-2009 10740060@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10740070@unknown@formal@none@1@S@Before there were search engines there was a complete list of all webservers.@@@@1@13@@danf@17-8-2009 10740080@unknown@formal@none@1@S@The list was edited by [[Tim Berners-Lee]] and hosted on the CERN webserver.@@@@1@13@@danf@17-8-2009 10740090@unknown@formal@none@1@S@One historical snapshot from 1992 remains.@@@@1@6@@danf@17-8-2009 10740100@unknown@formal@none@1@S@As more and more webservers went online the central list could not keep up.@@@@1@14@@danf@17-8-2009 10740110@unknown@formal@none@1@S@On the NCSA Site new servers were announced under the title "What's New!", but no complete listing existed any more.@@@@1@20@@danf@17-8-2009 10740120@unknown@formal@none@1@S@The very first tool used for searching on the (pre-web) Internet was [[Archie search engine|Archie]].@@@@1@15@@danf@17-8-2009 10740130@unknown@formal@none@1@S@The name stands for "archive" without the "v".@@@@1@8@@danf@17-8-2009 10740140@unknown@formal@none@1@S@It was created in 1990 by [[Alan Emtage]], a student at [[McGill University]] in Montreal.@@@@1@15@@danf@17-8-2009 10740150@unknown@formal@none@1@S@The program downloaded the directory listings of all the files located on public anonymous FTP ([[File Transfer Protocol]]) sites, creating a searchable database of file names; however, Archie did not index the contents of these sites.@@@@1@36@@danf@17-8-2009 10740160@unknown@formal@none@1@S@The rise of [[Gopher (protocol)|Gopher]] (created in 1991 by [[Mark McCahill]] at the [[University of Minnesota]]) led to two new search programs, [[Veronica (computer)|Veronica]] and [[Jughead (computer)|Jughead]].@@@@1@27@@danf@17-8-2009 10740170@unknown@formal@none@1@S@Like Archie, they searched the file names and titles stored in Gopher index systems.@@@@1@14@@danf@17-8-2009 10740180@unknown@formal@none@1@S@Veronica ('''V'''ery '''E'''asy '''R'''odent-'''O'''riented '''N'''et-wide '''I'''ndex to '''C'''omputerized '''A'''rchives) provided a keyword search of most Gopher menu titles in the entire Gopher listings.@@@@1@23@@danf@17-8-2009 10740190@unknown@formal@none@1@S@Jughead ('''J'''onzy's '''U'''niversal '''G'''opher '''H'''ierarchy '''E'''xcavation '''A'''nd '''D'''isplay) was a tool for obtaining menu information from specific Gopher servers.@@@@1@19@@danf@17-8-2009 10740200@unknown@formal@none@1@S@While the name of the search engine "[[Archie search engine|Archie]]" was not a reference to the [[Archie Comics|Archie comic book]] series, "[[Veronica Lodge|Veronica]]" and "[[Jughead Jones|Jughead]]" are characters in the series, thus referencing their predecessor.@@@@1@35@@danf@17-8-2009 10740210@unknown@formal@none@1@S@The first Web search engine was Wandex, a now-defunct index collected by the [[World Wide Web Wanderer]], a [[web crawler]] developed by Matthew Gray at [[Massachusetts Institute of Technology|MIT]] in 1993.@@@@1@31@@danf@17-8-2009 10740220@unknown@formal@none@1@S@Another very early search engine, [[Aliweb]], also appeared in 1993.@@@@1@10@@danf@17-8-2009 10740230@unknown@formal@none@1@S@[[JumpStation]] (released in early 1994) used a crawler to find web pages for searching, but search was limited to the title of web pages only.@@@@1@25@@danf@17-8-2009 10740240@unknown@formal@none@1@S@One of the first "full text" crawler-based search engines was [[WebCrawler]], which came out in 1994.@@@@1@16@@danf@17-8-2009 10740250@unknown@formal@none@1@S@Unlike its predecessors, it let users search for any word in any webpage, which became the standard for all major search engines since.@@@@1@23@@danf@17-8-2009 10740260@unknown@formal@none@1@S@It was also the first one to be widely known by the public.@@@@1@13@@danf@17-8-2009 10740270@unknown@formal@none@1@S@Also in 1994 [[Lycos]] (which started at [[Carnegie Mellon University]]) was launched, and became a major commercial endeavor.@@@@1@18@@danf@17-8-2009 10740280@unknown@formal@none@1@S@Soon after, many search engines appeared and vied for popularity.@@@@1@10@@danf@17-8-2009 10740290@unknown@formal@none@1@S@These included [[Magellan]], [[Excite]], [[Infoseek]], [[Inktomi]], [[Northern Light Group|Northern Light]], and [[AltaVista]].@@@@1@12@@danf@17-8-2009 10740300@unknown@formal@none@1@S@[[Yahoo!]] was among the most popular ways for people to find web pages of interest, but its search function operated on its [[web directory]], rather than full-text copies of web pages.@@@@1@31@@danf@17-8-2009 10740310@unknown@formal@none@1@S@Information seekers could also browse the directory instead of doing a keyword-based search.@@@@1@13@@danf@17-8-2009 10740320@unknown@formal@none@1@S@In 1996, [[Netscape]] was looking to give a single search engine an exclusive deal to be their featured search engine.@@@@1@20@@danf@17-8-2009 10740330@unknown@formal@none@1@S@There was so much interest that instead a deal was struck with Netscape by 5 of the major search engines, where for $5Million per year each search engine would be in a rotation on the Netscape search engine page.@@@@1@39@@danf@17-8-2009 10740340@unknown@formal@none@1@S@These five engines were: [[Yahoo!]], [[Magellan]], [[Lycos]], [[Infoseek]] and [[Excite]].@@@@1@10@@danf@17-8-2009 10740350@unknown@formal@none@1@S@Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s.@@@@1@22@@danf@17-8-2009 10740360@unknown@formal@none@1@S@Several companies entered the market spectacularly, receiving record gains during their [[initial public offering]]s.@@@@1@14@@danf@17-8-2009 10740370@unknown@formal@none@1@S@Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light.@@@@1@17@@danf@17-8-2009 10740380@unknown@formal@none@1@S@Many search engine companies were caught up in the [[dot-com bubble]], a speculation-driven market boom that peaked in 1999 and ended in 2001.@@@@1@23@@danf@17-8-2009 10740390@unknown@formal@none@1@S@Around 2000, the [[Google Search|Google search engine]] rose to prominence.@@@@1@10@@danf@17-8-2009 10740400@unknown@formal@none@1@S@The company achieved better results for many searches with an innovation called [[PageRank]].@@@@1@13@@danf@17-8-2009 10740410@unknown@formal@none@1@S@This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others.@@@@1@35@@danf@17-8-2009 10740420@unknown@formal@none@1@S@Google also maintained a minimalist interface to its search engine.@@@@1@10@@danf@17-8-2009 10740430@unknown@formal@none@1@S@In contrast, many of its competitors embedded a search engine in a [[web portal]].@@@@1@14@@danf@17-8-2009 10740440@unknown@formal@none@1@S@By 2000, Yahoo was providing search services based on [[Inktomi]]'s search engine.@@@@1@12@@danf@17-8-2009 10740450@unknown@formal@none@1@S@Yahoo! acquired [[Inktomi]] in 2002, and [[Overture]] (which owned [[AlltheWeb]] and [[AltaVista]]) in 2003.@@@@1@14@@danf@17-8-2009 10740460@unknown@formal@none@1@S@Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.@@@@1@23@@danf@17-8-2009 10740470@unknown@formal@none@1@S@Microsoft first launched MSN Search (since re-branded [[Live Search]]) in the fall of 1998 using search results from [[Inktomi]].@@@@1@19@@danf@17-8-2009 10740480@unknown@formal@none@1@S@In early 1999 the site began to display listings from [[Looksmart]] blended with results from [[Inktomi]] except for a short time in 1999 when results from [[AltaVista]] were used instead.@@@@1@30@@danf@17-8-2009 10740490@unknown@formal@none@1@S@In 2004, Microsoft began a transition to its own search technology, powered by its own [[web crawler]] (called [[msnbot]]).@@@@1@19@@danf@17-8-2009 10740500@unknown@formal@none@1@S@As of late 2007, Google was by far the most popular Web search engine worldwide.@@@@1@15@@danf@17-8-2009 10740510@unknown@formal@none@1@S@A number of country-specific search engine companies have become prominent; for example [[Baidu]] is the most popular search engine in the [[People's Republic of China]] and [[guruji.com]] in [[India]].@@@@1@29@@danf@17-8-2009 10740520@unknown@formal@none@1@S@==How Web search engines work==@@@@1@5@@danf@17-8-2009 10740530@unknown@formal@none@1@S@A search engine operates, in the following order@@@@1@8@@danf@17-8-2009 10740540@unknown@formal@none@1@S@# [[Web crawling]]@@@@1@3@@danf@17-8-2009 10740550@unknown@formal@none@1@S@# [[Index (search engine)|Indexing]]@@@@1@4@@danf@17-8-2009 10740560@unknown@formal@none@1@S@# [[Web search query|Searching]]@@@@1@4@@danf@17-8-2009 10740570@unknown@formal@none@1@S@Web search engines work by storing information about many web pages, which they retrieve from the WWW itself.@@@@1@18@@danf@17-8-2009 10740580@unknown@formal@none@1@S@These pages are retrieved by a [[Web crawler]] (sometimes also known as a spider) — an automated Web browser which follows every link it sees.@@@@1@25@@danf@17-8-2009 10740590@unknown@formal@none@1@S@Exclusions can be made by the use of [[robots.txt]].@@@@1@9@@danf@17-8-2009 10740600@unknown@formal@none@1@S@The contents of each page are then analyzed to determine how it should be [[Search engine indexing|indexed]] (for example, words are extracted from the titles, headings, or special fields called [[meta tags]]).@@@@1@32@@danf@17-8-2009 10740610@unknown@formal@none@1@S@Data about web pages are stored in an index database for use in later queries.@@@@1@15@@danf@17-8-2009 10740620@unknown@formal@none@1@S@Some search engines, such as [[Google]], store all or part of the source page (referred to as a [[web cache|cache]]) as well as information about the web pages, whereas others, such as [[AltaVista]], store every word of every page they find.@@@@1@41@@danf@17-8-2009 10740630@unknown@formal@none@1@S@This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it.@@@@1@43@@danf@17-8-2009 10740640@unknown@formal@none@1@S@This problem might be considered to be a mild form of [[linkrot]], and Google's handling of it increases [[usability]] by satisfying [[user expectations]] that the search terms will be on the returned webpage.@@@@1@33@@danf@17-8-2009 10740650@unknown@formal@none@1@S@This satisfies the [[principle of least astonishment]] since the user normally expects the search terms to be on the returned pages.@@@@1@21@@danf@17-8-2009 10740660@unknown@formal@none@1@S@Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.@@@@1@25@@danf@17-8-2009 10740670@unknown@formal@none@1@S@When a user enters a [[web search query|query]] into a search engine (typically by using [[Keyword (Internet search)|key word]]s), the engine examines its [[inverted index|index]] and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.@@@@1@52@@danf@17-8-2009 10740680@unknown@formal@none@1@S@Most search engines support the use of the [[boolean operators]] AND, OR and NOT to further specify the [[web search query|search query]].@@@@1@22@@danf@17-8-2009 10740690@unknown@formal@none@1@S@Some search engines provide an advanced feature called [[Proximity search (text)|proximity search]] which allows users to define the distance between keywords.@@@@1@21@@danf@17-8-2009 10740700@unknown@formal@none@1@S@The usefulness of a search engine depends on the [[relevance (information retrieval)|relevance]] of the '''result set''' it gives back.@@@@1@19@@danf@17-8-2009 10740710@unknown@formal@none@1@S@While there may be millions of webpages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others.@@@@1@25@@danf@17-8-2009 10740720@unknown@formal@none@1@S@Most search engines employ methods to [[rank order|rank]] the results to provide the "best" results first.@@@@1@16@@danf@17-8-2009 10740730@unknown@formal@none@1@S@How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another.@@@@1@27@@danf@17-8-2009 10740740@unknown@formal@none@1@S@The methods also change over time as Internet usage changes and new techniques evolve.@@@@1@14@@danf@17-8-2009 10740750@unknown@formal@none@1@S@Most Web search engines are commercial ventures supported by [[advertising]] revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results.@@@@1@35@@danf@17-8-2009 10740760@unknown@formal@none@1@S@Those search engines which do not accept money for their search engine results make money by running search related ads alongside the regular search engine results.@@@@1@26@@danf@17-8-2009 10740770@unknown@formal@none@1@S@The search engines make money every time someone clicks on one of these ads.@@@@1@14@@danf@17-8-2009 10740780@unknown@formal@none@1@S@The vast majority of search engines are run by private companies using proprietary algorithms and closed databases, though [[List of search engines#Open source search engines|some]] are open source.@@@@1@28@@danf@17-8-2009 10740790@unknown@formal@none@1@S@Revenue in the web search portals industry is projected to grow in 2008 by 13.4 percent, with broadband connections expected to rise by 15.1 percent.@@@@1@25@@danf@17-8-2009 10740800@unknown@formal@none@1@S@Between 2008 and 2012, industry revenue is projected to rise by 56 percent as Internet penetration still has some way to go to reach full saturation in American households.@@@@1@29@@danf@17-8-2009 10740810@unknown@formal@none@1@S@Furthermore, broadband services are projected to account for an ever increasing share of domestic Internet users, rising to 118.7 million by 2012, with an increasing share accounted for by fiber-optic and high speed cable lines.@@@@1@35@@danf@17-8-2009 10750010@unknown@formal@none@1@S@
Semantics
@@@@1@1@@danf@17-8-2009 10750020@unknown@formal@none@1@S@'''Semantics''' is the study of meaning in communication.@@@@1@8@@danf@17-8-2009 10750030@unknown@formal@none@1@S@The word derives from [[Greek language|Greek]] ''σημαντικός'' (''semantikos''), "significant", from ''σημαίνω'' (''semaino''), "to signify, to indicate" and that from ''σήμα'' (''sema''), "sign, mark, token".@@@@1@24@@danf@17-8-2009 10750040@unknown@formal@none@1@S@In [[linguistics]] it is the study of interpretation of signs as used by [[agent]]s or [[community|communities]] within particular circumstances and contexts.@@@@1@21@@danf@17-8-2009 10750050@unknown@formal@none@1@S@It has related meanings in several other fields.@@@@1@8@@danf@17-8-2009 10750060@unknown@formal@none@1@S@Semanticists differ on what constitutes [[Meaning (linguistics)|meaning]] in an expression.@@@@1@10@@danf@17-8-2009 10750070@unknown@formal@none@1@S@For example, in the sentence, "John loves a bagel", the word ''bagel'' may refer to the object itself, which is its ''literal'' meaning or ''[[denotation]]'', but it may also refer to many other figurative associations, such as how it meets John's hunger, etc., which may be its ''[[connotation]]''.@@@@1@48@@danf@17-8-2009 10750080@unknown@formal@none@1@S@Traditionally, the [[formal semantic]] view restricts semantics to its literal meaning, and relegates all figurative associations to [[pragmatics]], but this distinction is increasingly difficult to defend.@@@@1@26@@danf@17-8-2009 10750090@unknown@formal@none@1@S@The degree to which a theorist subscribes to the literal-figurative distinction decreases as one moves from the [[formal semantic]], [[semiotic]], [[pragmatic]], to the [[cognitive semantic]] traditions.@@@@1@26@@danf@17-8-2009 10750100@unknown@formal@none@1@S@The word ''semantic'' in its modern sense is considered to have first appeared in [[French language|French]] as ''sémantique'' in [[Michel Bréal]]'s 1897 book, ''Essai de sémantique'.@@@@1@26@@danf@17-8-2009 10750110@unknown@formal@none@1@S@In [[International Scientific Vocabulary]] semantics is also called ''[[semasiology]]''.@@@@1@9@@danf@17-8-2009 10750120@unknown@formal@none@1@S@The discipline of Semantics is distinct from [[General semantics|Alfred Korzybski's General Semantics]], which is a system for looking at non-immediate, or abstract meanings.@@@@1@23@@danf@17-8-2009 10750130@unknown@formal@none@1@S@==Linguistics==@@@@1@1@@danf@17-8-2009 10750140@unknown@formal@none@1@S@In [[linguistics]], '''semantics''' is the subfield that is devoted to the study of meaning, as inherent at the levels of words, phrases, sentences, and even larger units of [[discourse]] (referred to as ''texts'').@@@@1@33@@danf@17-8-2009 10750150@unknown@formal@none@1@S@The basic area of study is the meaning of [[sign (semiotics)|sign]]s, and the study of relations between different linguistic units: [[homonym]]y, [[synonym]]y, [[antonym]]y, [[polysemy]], [[paronyms]], [[hypernym]]y, [[hyponym]]y, [[meronymy]], [[metonymy]], [[holonymy]], [[exocentric]]ity / [[endocentric]]ity, linguistic [[compound (linguistics)|compounds]].@@@@1@36@@danf@17-8-2009 10750160@unknown@formal@none@1@S@A key concern is how meaning attaches to larger chunks of text, possibly as a result of the composition from smaller units of meaning.@@@@1@24@@danf@17-8-2009 10750170@unknown@formal@none@1@S@Traditionally, semantics has included the study of connotative ''[[word sense|sense]]'' and denotative ''[[reference]]'', [[truth condition]]s, [[argument structure]], [[thematic role]]s, [[discourse analysis]], and the linkage of all of these to syntax.@@@@1@30@@danf@17-8-2009 10750180@unknown@formal@none@1@S@[[Formal semantics|Formal semanticists]] are concerned with the modeling of meaning in terms of the semantics of logic.@@@@1@17@@danf@17-8-2009 10750190@unknown@formal@none@1@S@Thus the sentence ''John loves a bagel'' above can be broken down into its constituents (signs), of which the unit ''loves'' may serve as both syntactic and semantic [[head (linguistics)|head]].@@@@1@30@@danf@17-8-2009 10750200@unknown@formal@none@1@S@In the late 1960s, [[Richard Montague]] proposed a system for defining semantic entries in the lexicon in terms of [[lambda calculus]].@@@@1@21@@danf@17-8-2009 10750210@unknown@formal@none@1@S@Thus, the syntactic [[parsing|parse]] of the sentence above would now indicate ''loves'' as the head, and its entry in the lexicon would point to the arguments as the agent, ''John'', and the object, ''bagel'', with a special role for the article "a" (which Montague called a quantifier).@@@@1@47@@danf@17-8-2009 10750220@unknown@formal@none@1@S@This resulted in the sentence being associated with the logical predicate ''loves (John, bagel)'', thus linking semantics to [[categorial grammar]] models of [[syntax]].@@@@1@23@@danf@17-8-2009 10750230@unknown@formal@none@1@S@The logical predicate thus obtained would be elaborated further, e.g. using truth theory models, which ultimately relate meanings to a set of [[Tarski]]ian universals, which may lie outside the logic.@@@@1@30@@danf@17-8-2009 10750240@unknown@formal@none@1@S@The notion of such meaning atoms or primitives are basic to the [[language of thought]] hypothesis from the 70s.@@@@1@19@@danf@17-8-2009 10750250@unknown@formal@none@1@S@Despite its elegance, [[Montague grammar]] was limited by the context-dependent variability in word sense, and led to several attempts at incorporating context, such as :@@@@1@25@@danf@17-8-2009 10750260@unknown@formal@none@1@S@*[[situation semantics]] ('80s): Truth-values are incomplete, they get assigned based on context@@@@1@12@@danf@17-8-2009 10750270@unknown@formal@none@1@S@*[[generative lexicon]] ('90s): categories (types) are incomplete, and get assigned based on context@@@@1@13@@danf@17-8-2009 10750280@unknown@formal@none@1@S@===The dynamic turn in semantics===@@@@1@5@@danf@17-8-2009 10750290@unknown@formal@none@1@S@In the [[Noam Chomsky|Chomskian]] tradition in linguistics there was no mechanism for the learning of semantic relations, and the [[Psychological nativism|nativist]] view considered all semantic notions as inborn.@@@@1@28@@danf@17-8-2009 10750300@unknown@formal@none@1@S@Thus, even novel concepts were proposed to have been dormant in some sense.@@@@1@13@@danf@17-8-2009 10750310@unknown@formal@none@1@S@This traditional view was also unable to address many issues such as [[metaphor]] or associative meanings, and [[semantic change]], where meanings within a linguistic community change over time, and [[qualia]] or subjective experience.@@@@1@33@@danf@17-8-2009 10750320@unknown@formal@none@1@S@Another issue not addressed by the nativist model was how perceptual cues are combined in thought, e.g. in [[mental rotation]].@@@@1@20@@danf@17-8-2009 10750330@unknown@formal@none@1@S@This traditional view of semantics, as an innate finite meaning inherent in a [[lexical unit]] that can be composed to generate meanings for larger chunks of discourse, is now being fiercely debated in the emerging domain of [[cognitive linguistics]] and also in the non-[[Jerry Fodor|Fodorian]] camp in [[Philosophy of Language]].@@@@1@50@@danf@17-8-2009 10750340@unknown@formal@none@1@S@The challenge is motivated by@@@@1@5@@danf@17-8-2009 10750350@unknown@formal@none@1@S@* factors internal to language, such as the problem of resolving [[indexical]] or [[anaphora]] (e.g. ''this x'', ''him'', ''last week'').@@@@1@20@@danf@17-8-2009 10750360@unknown@formal@none@1@S@In these situations "context" serves as the input, but the interpreted utterance also modifies the context, so it is also the output.@@@@1@22@@danf@17-8-2009 10750370@unknown@formal@none@1@S@Thus, the interpretation is necessarily dynamic and the meaning of sentences is viewed as context-change potentials instead of [[propositions]].@@@@1@19@@danf@17-8-2009 10750380@unknown@formal@none@1@S@* factors external to language, i.e. language is not a set of labels stuck on things, but "a toolbox, the importance of whose elements lie in the way they function rather than their attachments to things."@@@@1@36@@danf@17-8-2009 10750390@unknown@formal@none@1@S@This view reflects the position of the later [[Wittgenstein]] and his famous ''game'' example, and is related to the positions of [[Willard Van Orman Quine|Quine]], [[Donald Davidson (philosopher)|Davidson]], and others.@@@@1@30@@danf@17-8-2009 10750400@unknown@formal@none@1@S@A concrete example of the latter phenomenon is semantic [[underspecification]] — meanings are not complete without some elements of context.@@@@1@20@@danf@17-8-2009 10750410@unknown@formal@none@1@S@To take an example of a single word, "red", its meaning in a phrase such as ''red book'' is similar to many other usages, and can be viewed as compositional.@@@@1@30@@danf@17-8-2009 10750420@unknown@formal@none@1@S@However, the colours implied in phrases such as "red wine" (very dark), and "red hair" (coppery), or "red soil", or "red skin" are very different.@@@@1@25@@danf@17-8-2009 10750430@unknown@formal@none@1@S@Indeed, these colours by themselves would not be called "red" by native speakers.@@@@1@13@@danf@17-8-2009 10750440@unknown@formal@none@1@S@These instances are contrastive, so "red wine" is so called only in comparison with the other kind of wine (which also is not "white" for the same reasons).@@@@1@28@@danf@17-8-2009 10750450@unknown@formal@none@1@S@This view goes back to [[Ferdinand de Saussure|de Saussure]]:@@@@1@9@@danf@17-8-2009 10750460@unknown@formal@none@1@S@:Each of a set of synonyms like ''redouter'' ('to dread'), ''craindre'' ('to fear'), ''avoir peur'' ('to be afraid') has its particular value only because they stand in contrast with one another.@@@@1@31@@danf@17-8-2009 10750470@unknown@formal@none@1@S@No word has a value that can be identified independently of what else is in its vicinity.@@@@1@17@@danf@17-8-2009 10750480@unknown@formal@none@1@S@and may go back to earlier [[India]]n views on language, especially the [[Nyaya]] view of words as [[Semantic indicator|indicators]] and not carriers of meaning.@@@@1@24@@danf@17-8-2009 10750490@unknown@formal@none@1@S@An attempt to defend a system based on propositional meaning for semantic underspecification can be found in the [[Generative Lexicon]] model of [[James Pustejovsky]], who extends contextual operations (based on type shifting) into the lexicon.@@@@1@35@@danf@17-8-2009 10750500@unknown@formal@none@1@S@Thus meanings are generated on the fly based on finite context.@@@@1@11@@danf@17-8-2009 10750510@unknown@formal@none@1@S@===Prototype theory===@@@@1@2@@danf@17-8-2009 10750520@unknown@formal@none@1@S@Another set of concepts related to fuzziness in semantics is based on [[Prototype Theory|prototype]]s.@@@@1@14@@danf@17-8-2009 10750530@unknown@formal@none@1@S@The work of [[Eleanor Rosch]] and [[George Lakoff]] in the 1970s led to a view that natural categories are not characterizable in terms of necessary and sufficient conditions, but are graded (fuzzy at their boundaries) and inconsistent as to the status of their constituent members.@@@@1@45@@danf@17-8-2009 10750540@unknown@formal@none@1@S@Systems of categories are not objectively "out there" in the world but are rooted in people's experience.@@@@1@17@@danf@17-8-2009 10750550@unknown@formal@none@1@S@These categories evolve as [[learning theory (education)|learned]] concepts of the world — meaning is not an objective truth, but a subjective construct, learned from experience, and language arises out of the "grounding of our conceptual systems in shared [[embodied philosophy|embodiment]] and bodily experience".@@@@1@43@@danf@17-8-2009 10750560@unknown@formal@none@1@S@A corollary of this is that the conceptual categories (i.e. the lexicon) will not be identical for different cultures, or indeed, for every individual in the same culture.@@@@1@28@@danf@17-8-2009 10750570@unknown@formal@none@1@S@This leads to another debate (see the [[Whorf-Sapir hypothesis]] or [[Eskimo words for snow]]).@@@@1@14@@danf@17-8-2009 10750580@unknown@formal@none@1@S@==Computer science==@@@@1@2@@danf@17-8-2009 10750590@unknown@formal@none@1@S@In [[computer science]], where it is considered as an application of [[mathematical logic]], semantics reflects the meaning of programs or functions.@@@@1@21@@danf@17-8-2009 10750600@unknown@formal@none@1@S@In this regard, semantics permits programs to be separated into their syntactical part (grammatical structure) and their semantic part (meaning).@@@@1@20@@danf@17-8-2009 10750610@unknown@formal@none@1@S@For instance, the following statements use different syntaxes (languages), but result in the same semantic:@@@@1@15@@danf@17-8-2009 10750620@unknown@formal@none@1@S@* x += y; ([[C (programming language)|C]], [[Java (programming language)|Java]], etc.)@@@@1@11@@danf@17-8-2009 10750630@unknown@formal@none@1@S@* x := x + y; ([[Pascal (programming language)|Pascal]])@@@@1@9@@danf@17-8-2009 10750640@unknown@formal@none@1@S@* Let x = x + y; (early [[BASIC]])@@@@1@9@@danf@17-8-2009 10750650@unknown@formal@none@1@S@* x = x + y (most BASIC dialects, [[Fortran]])@@@@1@10@@danf@17-8-2009 10750660@unknown@formal@none@1@S@Generally these operations would all perform an arithmetical addition of 'y' to 'x' and store the result in a variable 'x'.@@@@1@21@@danf@17-8-2009 10750670@unknown@formal@none@1@S@Semantics for computer applications falls into three categories:@@@@1@8@@danf@17-8-2009 10750680@unknown@formal@none@1@S@* [[Operational semantics]]: The meaning of a construct is specified by the computation it induces when it is executed on a machine.@@@@1@22@@danf@17-8-2009 10750690@unknown@formal@none@1@S@In particular, it is of interest ''how'' the effect of a computation is produced.@@@@1@14@@danf@17-8-2009 10750700@unknown@formal@none@1@S@* [[Denotational semantics]]: Meanings are modelled by mathematical objects that represent the effect of executing the constructs.@@@@1@17@@danf@17-8-2009 10750710@unknown@formal@none@1@S@Thus ''only'' the effect is of interest, not how it is obtained.@@@@1@12@@danf@17-8-2009 10750720@unknown@formal@none@1@S@* [[Axiomatic semantics]]: Specific properties of the effect of executing the constructs as expressed as ''assertions''.@@@@1@16@@danf@17-8-2009 10750730@unknown@formal@none@1@S@Thus there may be aspects of the executions that are ignored.@@@@1@11@@danf@17-8-2009 10750740@unknown@formal@none@1@S@The '''[[Semantic Web]]''' refers to the extension of the [[World Wide Web]] through the embedding of additional semantic [[metadata]]; s.a.@@@@1@20@@danf@17-8-2009 10750750@unknown@formal@none@1@S@[[Web Ontology Language]] (OWL).@@@@1@4@@danf@17-8-2009 10750760@unknown@formal@none@1@S@==Psychology==@@@@1@1@@danf@17-8-2009 10750770@unknown@formal@none@1@S@In [[psychology]], ''[[semantic memory]]'' is memory for meaning, in other words, the aspect of memory that preserves only the ''gist'', the general significance, of remembered experience, while [[episodic memory]] is memory for the ephemeral details, the individual features, or the unique particulars of experience.@@@@1@44@@danf@17-8-2009 10750780@unknown@formal@none@1@S@Word meaning is measured by the company they keep; the relationships among words themselves in a [[semantic network]].@@@@1@18@@danf@17-8-2009 10750790@unknown@formal@none@1@S@In a network created by people analyzing their understanding of the word (such as [[Wordnet]]) the links and decomposition structures of the network are few in number and kind; and include "part of", "kind of", and similar links.@@@@1@38@@danf@17-8-2009 10750800@unknown@formal@none@1@S@In automated [[ontologies]] the links are computed vectors without explicit meaning.@@@@1@11@@danf@17-8-2009 10750810@unknown@formal@none@1@S@Various automated technologies are being developed to compute the meaning of words: [[latent semantic indexing]] and [[support vector machines]] as well as [[natural language processing]], [[neural networks]] and [[predicate calculus]] techniques.@@@@1@31@@danf@17-8-2009 10750820@unknown@formal@none@1@S@Semantics has been reported to drive the course of psychotherapeutic interventions.@@@@1@11@@danf@17-8-2009 10750830@unknown@formal@none@1@S@Language structure can determine the treatment approach to drug-abusing patients. .@@@@1@11@@danf@17-8-2009 10750840@unknown@formal@none@1@S@While working in Europe for the US Information Agency, American psychiatrist, Dr. A. James Giannini reported semantic differences in medical approaches to addiction treatment..@@@@1@24@@danf@17-8-2009 10750850@unknown@formal@none@1@S@English speaking countries used the term "drug dependence" to describe a rather passive pathology in their patients.@@@@1@17@@danf@17-8-2009 10750860@unknown@formal@none@1@S@As a result the physician's role was more active.@@@@1@9@@danf@17-8-2009 10750870@unknown@formal@none@1@S@Southern European countries such as Italy and Yugoslavia utilized the concept of "tossicomania" (i.e. toxic mania) to describe a more acive rather than passive role of the addict.@@@@1@28@@danf@17-8-2009 10750880@unknown@formal@none@1@S@As a result the treating physician's role shifted to that of a more passive guide than that of an active interventionist. .@@@@1@22@@danf@17-8-2009 10760010@unknown@formal@none@1@S@
Sentence (linguistics)
@@@@1@2@@danf@17-8-2009 10760020@unknown@formal@none@1@S@In [[linguistics]], a '''sentence''' is a grammatical unit of one or more words, bearing minimal syntactic relation to the words that precede or follow it, often preceded and followed in speech by pauses, having one of a small number of characteristic intonation patterns, and typically expressing an independent statement, question, request, command, etc.@@@@1@53@@danf@17-8-2009 10760030@unknown@formal@none@1@S@Sentences are generally characterized in most languages by the presence of a [[finite verb]], e.g. "[[The quick brown fox jumps over the lazy dog]]".@@@@1@24@@danf@17-8-2009 10760050@unknown@formal@none@1@S@==Components of a sentence==@@@@1@4@@danf@17-8-2009 10760060@unknown@formal@none@1@S@A simple ''complete sentence'' consists of a ''[[subject (grammar)|subject]]'' and a ''[[predicate (grammar)|predicate]]''.@@@@1@13@@danf@17-8-2009 10760070@unknown@formal@none@1@S@The subject is typically a [[noun phrase]], though other kinds of phrases (such as [[gerund]] phrases) work as well, and some languages allow subjects to be omitted.@@@@1@27@@danf@17-8-2009 10760080@unknown@formal@none@1@S@The predicate is a finite [[verb phrase]]: it's a finite verb together with zero or more [[object (grammar)|objects]], zero or more [[complement (linguistics)|complements]], and zero or more [[adverbial]]s.@@@@1@28@@danf@17-8-2009 10760090@unknown@formal@none@1@S@See also [[copula]] for the consequences of this verb on the theory of sentence structure.@@@@1@15@@danf@17-8-2009 10760100@unknown@formal@none@1@S@===Clauses===@@@@1@1@@danf@17-8-2009 10760110@unknown@formal@none@1@S@A [[clause]] consists of a subject and a verb.@@@@1@9@@danf@17-8-2009 10760120@unknown@formal@none@1@S@There are two types of clauses: independent and subordinate (dependent).@@@@1@10@@danf@17-8-2009 10760130@unknown@formal@none@1@S@An independent clause consists of a subject verb and also demonstrates a complete thought: for example, "I am sad."@@@@1@19@@danf@17-8-2009 10760140@unknown@formal@none@1@S@A subordinate clause consists of a subject and a verb, but demonstrates an incomplete thought: for example, "Because I had to move."@@@@1@22@@danf@17-8-2009 10760150@unknown@formal@none@1@S@==Classification==@@@@1@1@@danf@17-8-2009 10760160@unknown@formal@none@1@S@===By structure===@@@@1@2@@danf@17-8-2009 10760170@unknown@formal@none@1@S@One traditional scheme for classifying [[English language|English]] sentences is by the number and types of [[finite verb|finite]] [[clause]]s:@@@@1@18@@danf@17-8-2009 10760180@unknown@formal@none@1@S@* A ''[[simple sentence]]'' consists of a single [[independent clause]] with no [[dependent clause]]s.@@@@1@14@@danf@17-8-2009 10760190@unknown@formal@none@1@S@* A ''[[compound sentence (linguistics)|compound sentence]]'' consists of multiple independent clauses with no dependent clauses.@@@@1@15@@danf@17-8-2009 10760200@unknown@formal@none@1@S@These clauses are joined together using [[grammatical conjunction|conjunctions]], [[punctuation]], or both.@@@@1@11@@danf@17-8-2009 10760210@unknown@formal@none@1@S@* A ''[[complex sentence]]'' consists of one or more independent clauses with at least one dependent clause.@@@@1@17@@danf@17-8-2009 10760220@unknown@formal@none@1@S@* A ''[[complex-compound sentence]]'' (or ''compound-complex sentence'') consists of multiple independent clauses, at least one of which has at least one dependent clause.@@@@1@23@@danf@17-8-2009 10760230@unknown@formal@none@1@S@===By purpose===@@@@1@2@@danf@17-8-2009 10760240@unknown@formal@none@1@S@Sentences can also be classified based on their purpose:@@@@1@9@@danf@17-8-2009 10760250@unknown@formal@none@1@S@*A ''declarative sentence'' or ''declaration'', the most common type, commonly makes a statement: ''I am going home.''@@@@1@17@@danf@17-8-2009 10760260@unknown@formal@none@1@S@*A ''negative sentence'' or ''[[negation (linguistics)|negation]]'' denies that a statement is true: ''I am not going home.''@@@@1@17@@danf@17-8-2009 10760270@unknown@formal@none@1@S@*An ''interrogative sentence'' or ''[[question]]'' is commonly used to request information — ''When are you going to work?'' — but sometimes not; ''see'' [[rhetorical question]].@@@@1@25@@danf@17-8-2009 10760280@unknown@formal@none@1@S@*An ''exclamatory sentence'' or ''[[exclamation]]'' is generally a more emphatic form of statement: ''What a wonderful day this is!''@@@@1@19@@danf@17-8-2009 10760290@unknown@formal@none@1@S@===Major and minor sentences===@@@@1@4@@danf@17-8-2009 10760300@unknown@formal@none@1@S@A major sentence is a ''regular'' sentence; it has a [[subject (grammar)|subject]] and a [[predicate (grammar)|predicate]].@@@@1@16@@danf@17-8-2009 10760310@unknown@formal@none@1@S@For example: ''I have a ball.''@@@@1@6@@danf@17-8-2009 10760320@unknown@formal@none@1@S@In this sentence one can change the persons: ''We have a ball.''@@@@1@12@@danf@17-8-2009 10760330@unknown@formal@none@1@S@However, a minor sentence is an irregular type of sentence.@@@@1@10@@danf@17-8-2009 10760340@unknown@formal@none@1@S@It does not contain a finite verb.@@@@1@7@@danf@17-8-2009 10760350@unknown@formal@none@1@S@For example, "Mary!"@@@@1@3@@danf@17-8-2009 10760360@unknown@formal@none@1@S@"Yes."@@@@1@1@@danf@17-8-2009 10760370@unknown@formal@none@1@S@"Coffee." etc.@@@@1@2@@danf@17-8-2009 10760380@unknown@formal@none@1@S@Other examples of minor sentences are headings (e.g. the heading of this entry), stereotyped expressions (''Hello!''), emotional expressions (''Wow!''), proverbs, etc.@@@@1@21@@danf@17-8-2009 10760390@unknown@formal@none@1@S@This can also include sentences which do not contain verbs (e.g. ''The more, the merrier.'') in order to intensify the meaning around the nouns (normally found in poetry and catchphrases) by Judee N..@@@@1@33@@danf@17-8-2009 10770010@unknown@formal@none@1@S@
Computer software
@@@@1@2@@danf@17-8-2009 10770020@unknown@formal@none@1@S@'''Computer software,''' or just '''software''' is a general term used to describe a collection of [[computer program]]s, [[procedures]] and documentation that perform some tasks on a computer system.@@@@1@28@@danf@17-8-2009 10770030@unknown@formal@none@1@S@The term includes [[application software]] such as [[word processor]]s which perform productive tasks for users, [[system software]] such as [[operating system]]s, which interface with [[hardware]] to provide the necessary services for application software, and [[middleware]] which controls and co-ordinates [[Distributed computing|distributed systems]].@@@@1@42@@danf@17-8-2009 10770040@unknown@formal@none@1@S@"Software" is sometimes used in a broader context to mean anything which is not hardware but which is ''used'' with hardware, such as film, tapes and records.@@@@1@27@@danf@17-8-2009 10770050@unknown@formal@none@1@S@==Relationship to computer hardware==@@@@1@4@@danf@17-8-2009 10770060@unknown@formal@none@1@S@[[Computer]] software is so called to distinguish it from [[computer hardware]], which encompasses the physical interconnections and devices required to store and execute (or run) the software.@@@@1@27@@danf@17-8-2009 10770070@unknown@formal@none@1@S@At the lowest level, software consists of a [[machine language]] specific to an individual processor.@@@@1@15@@danf@17-8-2009 10770080@unknown@formal@none@1@S@A machine language consists of groups of binary values signifying processor instructions which change the state of the computer from its preceding state.@@@@1@23@@danf@17-8-2009 10770090@unknown@formal@none@1@S@Software is an ordered sequence of instructions for changing the state of the computer hardware in a particular sequence.@@@@1@19@@danf@17-8-2009 10770100@unknown@formal@none@1@S@It is usually written in [[high-level programming language]]s that are easier and more efficient for humans to use (closer to [[natural language]]) than machine language.@@@@1@25@@danf@17-8-2009 10770110@unknown@formal@none@1@S@High-level languages are [[compiler|compiled]] or [[interpreter (computing)|interpreted]] into machine language object code.@@@@1@12@@danf@17-8-2009 10770120@unknown@formal@none@1@S@Software may also be written in an [[assembly language]], essentially, a mnemonic representation of a machine language using a natural language alphabet.@@@@1@22@@danf@17-8-2009 10770130@unknown@formal@none@1@S@Assembly language must be assembled into object code via an [[assembly language#Assembler|assembler]].@@@@1@12@@danf@17-8-2009 10770140@unknown@formal@none@1@S@The term "software" was first used in this sense by [[John W. Tukey]] in [[1958]].@@@@1@15@@danf@17-8-2009 10770150@unknown@formal@none@1@S@In [[computer science]] and [[software engineering]], '''computer software''' is all computer programs.@@@@1@12@@danf@17-8-2009 10770160@unknown@formal@none@1@S@The theory that is the basis for most modern software was first proposed by [[Alan Turing]] in his [[1935]] essay ''Computable numbers with an application to the Entscheidungsproblem''.@@@@1@28@@danf@17-8-2009 10770170@unknown@formal@none@1@S@==Types==@@@@1@1@@danf@17-8-2009 10770180@unknown@formal@none@1@S@Practical [[computer system]]s divide [[software system]]s into three major classes: [[system software]], [[programming software]] and [[application software]], although the distinction is arbitrary, and often blurred.@@@@1@25@@danf@17-8-2009 10770190@unknown@formal@none@1@S@*'''[[System software]]''' helps run the [[computer hardware]] and [[computer system]].@@@@1@10@@danf@17-8-2009 10770200@unknown@formal@none@1@S@It includes [[operating system]]s, [[device driver]]s, diagnostic tools, [[Server (computing)|server]]s, [[windowing system]]s, [[software utility|utilities]] and more.@@@@1@16@@danf@17-8-2009 10770210@unknown@formal@none@1@S@The purpose of systems software is to insulate the applications programmer as much as possible from the details of the particular computer complex being used, especially memory and other hardware features, and such as accessory devices as communications, printers, readers, displays, keyboards, etc.@@@@1@43@@danf@17-8-2009 10770220@unknown@formal@none@1@S@*'''[[Programming software]]''' usually provides tools to assist a [[programmer]] in writing [[computer program]]s, and software using different [[programming language]]s in a more convenient way.@@@@1@24@@danf@17-8-2009 10770230@unknown@formal@none@1@S@The tools include [[text editors]], [[compilers]], [[interpreter (computing)|interpreters]], [[linkers]], [[debuggers]], and so on.@@@@1@13@@danf@17-8-2009 10770240@unknown@formal@none@1@S@An [[Integrated development environment]] (IDE) merges those tools into a software bundle, and a programmer may not need to type multiple [[command]]s for compiling, interpreting, debugging, tracing, and etc., because the IDE usually has an advanced ''[[graphical user interface]],'' or GUI.@@@@1@41@@danf@17-8-2009 10770250@unknown@formal@none@1@S@*'''[[Application software]]''' allows end users to accomplish one or more specific (non-computer related) [[task]]s.@@@@1@14@@danf@17-8-2009 10770260@unknown@formal@none@1@S@Typical applications include [[Industry|industrial]] [[automation]], [[business software]], [[educational software]], [[medical software]], [[database]]s, and [[computer games]].@@@@1@15@@danf@17-8-2009 10770270@unknown@formal@none@1@S@Businesses are probably the biggest users of application software, but almost every field of human activity now uses some form of application software@@@@1@23@@danf@17-8-2009 10770280@unknown@formal@none@1@S@==Program and library==@@@@1@3@@danf@17-8-2009 10770290@unknown@formal@none@1@S@A [[Computer program|program]] may not be sufficiently complete for execution by a [[computer]].@@@@1@13@@danf@17-8-2009 10770300@unknown@formal@none@1@S@In particular, it may require additional software from a [[software library]] in order to be complete.@@@@1@16@@danf@17-8-2009 10770310@unknown@formal@none@1@S@Such a library may include software components used by [[stand-alone]] programs, but which cannot work on their own.@@@@1@18@@danf@17-8-2009 10770320@unknown@formal@none@1@S@Thus, programs may include standard routines that are common to many programs, extracted from these libraries.@@@@1@16@@danf@17-8-2009 10770330@unknown@formal@none@1@S@Libraries may also ''include'' 'stand-alone' programs which are activated by some [[event-driven programming|computer event]] and/or perform some function (e.g., of computer 'housekeeping') but do not return data to their calling program.@@@@1@31@@danf@17-8-2009 10770340@unknown@formal@none@1@S@Libraries may be [[Execution (computers)|called]] by one to many other programs; programs may call zero to many other programs.@@@@1@19@@danf@17-8-2009 10770350@unknown@formal@none@1@S@==Three layers==@@@@1@2@@danf@17-8-2009 10770360@unknown@formal@none@1@S@Users often see things differently than programmers.@@@@1@7@@danf@17-8-2009 10770370@unknown@formal@none@1@S@People who use modern general purpose computers (as opposed to [[embedded system]]s, [[analog computer]]s, [[supercomputer]]s, etc.) usually see three layers of software performing a variety of tasks: platform, application, and user software.@@@@1@32@@danf@17-8-2009 10770380@unknown@formal@none@1@S@;Platform software:@@@@1@2@@danf@17-8-2009 10770390@unknown@formal@none@1@S@[[Platform (computing)|Platform]] includes the [[firmware]], [[device driver]]s, an [[operating system]], and typically a [[graphical user interface]] which, in total, allow a user to interact with the computer and its [[peripheral]]s (associated equipment).@@@@1@32@@danf@17-8-2009 10770400@unknown@formal@none@1@S@Platform software often comes bundled with the computer.@@@@1@8@@danf@17-8-2009 10770410@unknown@formal@none@1@S@On a [[Personal computer|PC]] you will usually have the ability to change the platform software.@@@@1@15@@danf@17-8-2009 10770420@unknown@formal@none@1@S@;Application software:@@@@1@2@@danf@17-8-2009 10770430@unknown@formal@none@1@S@[[Application software]] or Applications are what most people think of when they think of software.@@@@1@15@@danf@17-8-2009 10770440@unknown@formal@none@1@S@Typical examples include office suites and video games.@@@@1@8@@danf@17-8-2009 10770450@unknown@formal@none@1@S@Application software is often purchased separately from computer hardware.@@@@1@9@@danf@17-8-2009 10770460@unknown@formal@none@1@S@Sometimes applications are bundled with the computer, but that does not change the fact that they run as independent applications.@@@@1@20@@danf@17-8-2009 10770470@unknown@formal@none@1@S@Applications are almost always independent programs from the operating system, though they are often tailored for specific platforms.@@@@1@18@@danf@17-8-2009 10770480@unknown@formal@none@1@S@Most users think of compilers, databases, and other "system software" as applications.@@@@1@12@@danf@17-8-2009 10770490@unknown@formal@none@1@S@;User-written software:@@@@1@2@@danf@17-8-2009 10770500@unknown@formal@none@1@S@[[End-user development]] tailors systems to meet users' specific needs.@@@@1@9@@danf@17-8-2009 10770510@unknown@formal@none@1@S@User software include spreadsheet templates, word processor macros, scientific simulations, and scripts for graphics and animations.@@@@1@16@@danf@17-8-2009 10770520@unknown@formal@none@1@S@Even email filters are a kind of user software.@@@@1@9@@danf@17-8-2009 10770530@unknown@formal@none@1@S@Users create this software themselves and often overlook how important it is.@@@@1@12@@danf@17-8-2009 10770535@unknown@formal@none@1@S@Depending on how competently the user-written software has been integrated into purchased application packages, many users may not be aware of the distinction between the purchased packages, and what has been added by fellow co-workers.@@@@1@35@@danf@17-8-2009 10770540@unknown@formal@none@1@S@==Creation==@@@@1@1@@danf@17-8-2009 10770550@unknown@formal@none@1@S@==Operation==@@@@1@1@@danf@17-8-2009 10770560@unknown@formal@none@1@S@Computer software has to be "loaded" into the [[computer storage|computer's storage]] (such as a ''[[hard drive]]'', ''memory'', or ''[[RAM]]'').@@@@1@19@@danf@17-8-2009 10770570@unknown@formal@none@1@S@Once the software has loaded, the computer is able to ''execute'' the software.@@@@1@13@@danf@17-8-2009 10770580@unknown@formal@none@1@S@This involves passing [[instruction (computer science)|instructions]] from the application software, through the system software, to the [[hardware]] which ultimately receives the instruction as [[machine language|machine code]].@@@@1@26@@danf@17-8-2009 10770590@unknown@formal@none@1@S@Each instruction causes the computer to carry out an operation -- moving [[data (computing)|data]], carrying out a [[computation]], or altering the [[control flow]] of instructions.@@@@1@25@@danf@17-8-2009 10770600@unknown@formal@none@1@S@Data movement is typically from one place in memory to another.@@@@1@11@@danf@17-8-2009 10770610@unknown@formal@none@1@S@Sometimes it involves moving data between memory and registers which enable high-speed data access in the CPU.@@@@1@17@@danf@17-8-2009 10770620@unknown@formal@none@1@S@Moving data, especially large amounts of it, can be costly.@@@@1@10@@danf@17-8-2009 10770630@unknown@formal@none@1@S@So, this is sometimes avoided by using "pointers" to data instead.@@@@1@11@@danf@17-8-2009 10770640@unknown@formal@none@1@S@Computations include simple operations such as incrementing the value of a variable data element.@@@@1@14@@danf@17-8-2009 10770650@unknown@formal@none@1@S@More complex computations may involve many operations and data elements together.@@@@1@11@@danf@17-8-2009 10770660@unknown@formal@none@1@S@Instructions may be performed sequentially, conditionally, or iteratively.@@@@1@8@@danf@17-8-2009 10770670@unknown@formal@none@1@S@Sequential instructions are those operations that are performed one after another.@@@@1@11@@danf@17-8-2009 10770680@unknown@formal@none@1@S@Conditional instructions are performed such that different sets of instructions execute depending on the value(s) of some data.@@@@1@18@@danf@17-8-2009 10770690@unknown@formal@none@1@S@In some languages this is known as an "if" statement.@@@@1@10@@danf@17-8-2009 10770700@unknown@formal@none@1@S@Iterative instructions are performed repetitively and may depend on some data value.@@@@1@12@@danf@17-8-2009 10770710@unknown@formal@none@1@S@This is sometimes called a "loop."@@@@1@6@@danf@17-8-2009 10770720@unknown@formal@none@1@S@Often, one instruction may "call" another set of instructions that are defined in some other program or [[module (programming)|module]].@@@@1@19@@danf@17-8-2009 10770730@unknown@formal@none@1@S@When more than one computer processor is used, instructions may be executed simultaneously.@@@@1@13@@danf@17-8-2009 10770740@unknown@formal@none@1@S@A simple example of the way software operates is what happens when a user selects an entry such as "Copy" from a menu.@@@@1@23@@danf@17-8-2009 10770750@unknown@formal@none@1@S@In this case, a conditional instruction is executed to copy text from data in a 'document' area residing in memory, perhaps to an intermediate storage area known as a 'clipboard' data area.@@@@1@32@@danf@17-8-2009 10770760@unknown@formal@none@1@S@If a different menu entry such as "Paste" is chosen, the software may execute the instructions to copy the text from the clipboard data area to a specific location in the same or another document in memory.@@@@1@37@@danf@17-8-2009 10770770@unknown@formal@none@1@S@Depending on the application, even the example above could become complicated.@@@@1@11@@danf@17-8-2009 10770780@unknown@formal@none@1@S@The field of [[software engineering]] endeavors to manage the complexity of how software operates.@@@@1@14@@danf@17-8-2009 10770790@unknown@formal@none@1@S@This is especially true for software that operates in the context of a large or powerful [[computer system]].@@@@1@18@@danf@17-8-2009 10770800@unknown@formal@none@1@S@Currently, almost the only limitations on the use of computer software in applications is the ingenuity of the designer/programmer.@@@@1@19@@danf@17-8-2009 10770810@unknown@formal@none@1@S@Consequently, large areas of activities (such as playing grand master level chess) formerly assumed to be incapable of software simulation are now routinely programmed.@@@@1@24@@danf@17-8-2009 10770820@unknown@formal@none@1@S@The only area that has so far proved reasonably secure from software simulation is the realm of human art— especially, pleasing music and literature.@@@@1@24@@danf@17-8-2009 10770830@unknown@formal@none@1@S@Kinds of software by operation: [[computer program]] as [[executable]], [[source code]] or [[script (computer programming)|script]], [[computer configuration|configuration]].@@@@1@17@@danf@17-8-2009 10770840@unknown@formal@none@1@S@==Quality and reliability==@@@@1@3@@danf@17-8-2009 10770850@unknown@formal@none@1@S@[[Software reliability]] considers the errors, faults, and failures related to the design, implementation and operation of software.@@@@1@17@@danf@17-8-2009 10770860@unknown@formal@none@1@S@'''See''' [[Computer security audit|Software auditing]], [[Software quality]], [[Software testing]], and [[Software reliability]].@@@@1@12@@danf@17-8-2009 10770870@unknown@formal@none@1@S@==License==@@@@1@1@@danf@17-8-2009 10770880@unknown@formal@none@1@S@[[Software license]] gives the user the right to use the software in the licensed environment, some software comes with the license when purchased off the shelf, or an OEM license when bundled with hardware.@@@@1@34@@danf@17-8-2009 10770890@unknown@formal@none@1@S@Other software comes with a [[free software licence]], granting the recipient the rights to modify and redistribute the software.@@@@1@19@@danf@17-8-2009 10770900@unknown@formal@none@1@S@Software can also be in the form of [[freeware]] or [[shareware]].@@@@1@11@@danf@17-8-2009 10770910@unknown@formal@none@1@S@See also [[License Management]].@@@@1@4@@danf@17-8-2009 10770920@unknown@formal@none@1@S@==Patents==@@@@1@1@@danf@17-8-2009 10770930@unknown@formal@none@1@S@The issue of [[software patent]]s is controversial.@@@@1@7@@danf@17-8-2009 10770940@unknown@formal@none@1@S@Some believe that they hinder [[software development]], while others argue that software patents provide an important incentive to spur software innovation.@@@@1@21@@danf@17-8-2009 10770950@unknown@formal@none@1@S@See [[software patent debate]].@@@@1@4@@danf@17-8-2009 10770960@unknown@formal@none@1@S@==Ethics and rights for software users==@@@@1@6@@danf@17-8-2009 10770970@unknown@formal@none@1@S@Being a new part of society, the idea of what rights users of software should have is not very developed.@@@@1@20@@danf@17-8-2009 10770980@unknown@formal@none@1@S@Some, such as the [[free software community]], believe that software users should be free to modify and redistribute the software they use.@@@@1@22@@danf@17-8-2009 10770990@unknown@formal@none@1@S@They argue that these rights are necessary so that each individual can control their computer, and so that everyone can cooperate, if they choose, to work together as a community and control the direction that software progresses in.@@@@1@38@@danf@17-8-2009 10770995@unknown@formal@none@1@S@Others believe that software authors should have the power to say what rights the user will get.@@@@1@17@@danf@17-8-2009 10771000@unknown@formal@none@1@S@==Software companies and non-profit organizations==@@@@1@5@@danf@17-8-2009 10771010@unknown@formal@none@1@S@Examples of non-profit software organizations : [[Free Software Foundation]], [[GNU Project]], [[Mozilla Foundation]]@@@@1@13@@danf@17-8-2009 10771020@unknown@formal@none@1@S@Examples of large software companies are: [[Microsoft]], [[IBM]], [[Oracle_Corporation|Oracle]], [[SAP AG|SAP]] and [[HP]].@@@@1@13@@danf@17-8-2009 10780010@unknown@formal@none@1@S@
Spanish language
@@@@1@2@@danf@17-8-2009 10780020@unknown@formal@none@1@S@'''Spanish''' or '''Castilian''' (''castellano'') is an [[Indo-European]], [[Romance languages|Romance language]] that originated in northern [[Spain]], and gradually spread in the [[Kingdom of Castile]] and evolved into the principal language of government and trade.@@@@1@33@@danf@17-8-2009 10780030@unknown@formal@none@1@S@It was taken to [[Spanish Empire#Territories in Africa (1898–1975)|Africa]], the [[Spanish colonization of the Americas|Americas]], and [[Spanish East Indies|Asia Pacific]] with the expansion of the [[Spanish Empire]] between the fifteenth and nineteenth centuries.@@@@1@33@@danf@17-8-2009 10780040@unknown@formal@none@1@S@Today, between 322 and 400 million people speak Spanish as a native language, making it the world's second most-spoken language by native speakers (after [[Standard Mandarin|Mandarin Chinese]]).@@@@1@27@@danf@17-8-2009 10780050@unknown@formal@none@1@S@==Hispanosphere==@@@@1@1@@danf@17-8-2009 10780060@unknown@formal@none@1@S@It is estimated that the combined total of native and non-native Spanish speakers is approximately 500 million, likely making it the third most spoken language by total number of speakers (after [[English_language|English]] and [[Chinese_language|Chinese]]).@@@@1@34@@danf@17-8-2009 10780070@unknown@formal@none@1@S@Today, Spanish is an official language of Spain, most [[Latin American]] countries, and [[Equatorial Guinea]]; 21 nations speak it as their primary language.@@@@1@23@@danf@17-8-2009 10780080@unknown@formal@none@1@S@Spanish also is one of [[United Nations#Languages|six official languages]] of the [[United Nations]].@@@@1@13@@danf@17-8-2009 10780090@unknown@formal@none@1@S@[[Mexico]] has the world's largest Spanish-speaking population, and Spanish is the second most-widely spoken language in the [[United States]] and the most popular studied foreign language in [[United States|U.S.]] schools and universities.@@@@1@32@@danf@17-8-2009 10780100@unknown@formal@none@1@S@[[Global internet usage]] statistics for 2007 show Spanish as the third most commonly used language on the Internet, after English and [[Chinese language|Chinese]].@@@@1@23@@danf@17-8-2009 10780110@unknown@formal@none@1@S@==Naming and origin==@@@@1@3@@danf@17-8-2009 10780120@unknown@formal@none@1@S@Spaniards tend to call this language {{lang|es|'''''español'''''}} (Spanish) when contrasting it with languages of other states, such as [[French language|French]] and [[English language|English]], but call it {{lang|es|'''''castellano'''''}} (Castilian), that is, the language of the [[Castile (historical region)|Castile]] region, when contrasting it with other [[languages of Spain|languages spoken in Spain]] such as [[Galician language|Galician]], [[Basque language|Basque]], and [[Catalan language|Catalan]].@@@@1@58@@danf@17-8-2009 10780130@unknown@formal@none@1@S@This reasoning also holds true for the language's preferred name in some [[Hispanic America]]n countries.@@@@1@15@@danf@17-8-2009 10780140@unknown@formal@none@1@S@In this manner, the [[Spanish Constitution of 1978]] uses the term {{lang|es|''castellano''}} to define the [[official language]] of the whole Spanish State, as opposed to {{lang|es|''las demás lenguas españolas''}} (lit. ''the other Spanish languages'').@@@@1@34@@danf@17-8-2009 10780150@unknown@formal@none@1@S@Article III reads as follows:@@@@1@5@@danf@17-8-2009 10780160@unknown@formal@none@1@S@The name ''castellano'' is, however, widely used for the language as a whole in Latin America.@@@@1@16@@danf@17-8-2009 10780170@unknown@formal@none@1@S@Some Spanish speakers consider ''{{lang|es|castellano}}'' a generic term with no political or ideological links, much as "Spanish" is in English.@@@@1@20@@danf@17-8-2009 10780180@unknown@formal@none@1@S@Often Latin Americans use it to differentiate their own variety of Spanish as opposed to the variety of Spanish spoken in Spain, or variety of Spanish which is considered as standard in the region.@@@@1@34@@danf@17-8-2009 10780190@unknown@formal@none@1@S@==Classification and related languages==@@@@1@4@@danf@17-8-2009 10780200@unknown@formal@none@1@S@Spanish is closely related to the other [[West Iberian languages|West Iberian]] Romance languages: [[Asturian language|Asturian]] ({{lang|ast|''asturianu''}}), [[Galician language|Galician]] ({{lang|gl|''galego''}}), [[Ladino language|Ladino]] ({{lang|lad|''dzhudezmo/spanyol/kasteyano''}}), and [[Portuguese language|Portuguese]] ({{lang|pt|''português''}}).@@@@1@26@@danf@17-8-2009 10780210@unknown@formal@none@1@S@Catalan, an [[Iberian Romance languages|East Iberian language]] which exhibits many [[Gallo-Romance]] traits, is more similar to the neighbouring [[Occitan language]] ({{lang|oc|''occitan''}}) than to Spanish, or indeed than Spanish and Portuguese are to each other.@@@@1@34@@danf@17-8-2009 10780220@unknown@formal@none@1@S@Spanish and Portuguese share similar grammars and vocabulary as well as a common history of [[Influence of Arabic on other languages|Arabic influence]] while a great part of the peninsula was under [[Timeline of the Muslim presence in the Iberian peninsula|Islamic rule]] (both languages expanded over [[Islamic empire|Islamic territories]]).@@@@1@48@@danf@17-8-2009 10780230@unknown@formal@none@1@S@Their [[lexical similarity]] has been estimated as 89%.@@@@1@8@@danf@17-8-2009 10780240@unknown@formal@none@1@S@See [[Differences between Spanish and Portuguese]] for further information.@@@@1@9@@danf@17-8-2009 10780250@unknown@formal@none@1@S@===Ladino===@@@@1@1@@danf@17-8-2009 10780260@unknown@formal@none@1@S@Ladino, which is essentially medieval Spanish and closer to modern Spanish than any other language, is spoken by many descendants of the [[Sephardi Jews]] who were [[Alhambra decree|expelled from Spain in the 15th century]].@@@@1@34@@danf@17-8-2009 10780270@unknown@formal@none@1@S@Ladino speakers are currently almost exclusively [[Sephardim|Sephardi]] Jews, with family roots in Turkey, Greece or the Balkans: current speakers mostly live in Israel and Turkey, with a few pockets in Latin America.@@@@1@32@@danf@17-8-2009 10780280@unknown@formal@none@1@S@It lacks the [[Amerindian languages|Native American vocabulary]] which was influential during the [[Spanish Empire|Spanish colonial period]], and it retains many archaic features which have since been lost in standard Spanish.@@@@1@30@@danf@17-8-2009 10780290@unknown@formal@none@1@S@It contains, however, other vocabulary which is not found in standard Castilian, including vocabulary from [[Hebrew language|Hebrew]], some French, Greek and [[Turkish language|Turkish]], and other languages spoken where the Sephardim settled.@@@@1@31@@danf@17-8-2009 10780300@unknown@formal@none@1@S@Ladino is in serious danger of extinction because many native speakers today are elderly as well as elderly ''olim'' (immigrants to [[Israel]]) who have not transmitted the language to their children or grandchildren.@@@@1@33@@danf@17-8-2009 10780310@unknown@formal@none@1@S@However, it is experiencing a minor revival among Sephardi communities, especially in music.@@@@1@13@@danf@17-8-2009 10780320@unknown@formal@none@1@S@In the case of the Latin American communities, the danger of extinction is also due to the risk of assimilation by modern Castilian.@@@@1@23@@danf@17-8-2009 10780330@unknown@formal@none@1@S@A related dialect is [[Haketia]], the Judaeo-Spanish of northern Morocco.@@@@1@10@@danf@17-8-2009 10780340@unknown@formal@none@1@S@This too tended to assimilate with modern Spanish, during the Spanish occupation of the region.@@@@1@15@@danf@17-8-2009 10780350@unknown@formal@none@1@S@===Vocabulary comparison===@@@@1@2@@danf@17-8-2009 10780360@unknown@formal@none@1@S@Spanish and [[Italian language|Italian]] share a very similar phonological system.@@@@1@10@@danf@17-8-2009 10780370@unknown@formal@none@1@S@At present, the [[lexical similarity]] with Italian is estimated at 82%.@@@@1@11@@danf@17-8-2009 10780380@unknown@formal@none@1@S@As a result, Spanish and Italian are mutually intelligible to various degrees.@@@@1@12@@danf@17-8-2009 10780390@unknown@formal@none@1@S@The lexical similarity with [[Portuguese language|Portuguese]] is greater, 89%, but the vagaries of Portuguese pronunciation make it less easily understood by Hispanophones than Italian.@@@@1@24@@danf@17-8-2009 10780400@unknown@formal@none@1@S@[[Mutual intelligibility]] between Spanish and [[French language|French]] or [[Romanian language|Romanian]] is even lower (lexical similarity being respectively 75% and 71%): comprehension of Spanish by French speakers who have not studied the language is as low as an estimated 45% - the same as of English.@@@@1@45@@danf@17-8-2009 10780410@unknown@formal@none@1@S@The common features of the writing systems of the Romance languages allow for a greater amount of interlingual reading comprehension than oral communication would.@@@@1@24@@danf@17-8-2009 10780420@unknown@formal@none@1@S@ 1. also {{lang|pt|''nós outros''}} in early modern Portuguese (e.g. ''[[The Lusiads]]'')@@@@1@12@@danf@17-8-2009 10780430@unknown@formal@none@1@S@2. {{lang|it|''noi '''altri'''''}} in Southern [[List of languages of Italy|Italian dialects and languages]]@@@@1@13@@danf@17-8-2009 10780440@unknown@formal@none@1@S@3. Alternatively {{lang|fr|''nous '''autres'''''}} @@@@1@5@@danf@17-8-2009 10780460@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10780470@unknown@formal@none@1@S@Spanish evolved from [[Vulgar Latin]], with major [[Arabic influence on the Spanish language|influences from Arabic]] in vocabulary during the [[Al-Andalus|Andalusian]] period and minor surviving influences from [[Basque language|Basque]] and [[Celtiberian language|Celtiberian]], as well as [[Germanic languages]] via the [[Visigoths]].@@@@1@39@@danf@17-8-2009 10780480@unknown@formal@none@1@S@Spanish developed along the remote cross road strips among the [[Alava]], [[Cantabria]], [[Burgos]], [[Soria]] and [[La Rioja (autonomous community)|La Rioja]] provinces of Northern Spain, as a strongly innovative and differing variant from its nearest cousin, [[Asturian|Leonese speech]], with a higher degree of Basque influence in these regions (see [[Iberian Romance languages]]).@@@@1@51@@danf@17-8-2009 10780490@unknown@formal@none@1@S@Typical features of Spanish diachronical [[phonology]] include [[lenition]] (Latin {{lang|la|''vita''}}, Spanish {{lang|es|''vida''}}), [[palatalization]] (Latin {{lang|la|''annum''}}, Spanish {{lang|es|''año''}}, and Latin {{lang|la|''anellum''}}, Spanish {{lang|es|''anillo''}}) and [[diphthong]]ation ([[stem (linguistics)|stem]]-changing) of short ''e'' and ''o'' from Vulgar Latin (Latin {{lang|la|''terra''}}, Spanish {{lang|es|''tierra''}}; Latin {{lang|la|''novus''}}, Spanish {{lang|es|''nuevo''}}).@@@@1@42@@danf@17-8-2009 10780500@unknown@formal@none@1@S@Similar phenomena can be found in other Romance languages as well.@@@@1@11@@danf@17-8-2009 10780510@unknown@formal@none@1@S@During the {{lang|es|''[[Reconquista]]''}}, this northern dialect from [[Cantabria]] was carried south, and remains a [[minority language]] in the northern coastal [[Morocco]].@@@@1@21@@danf@17-8-2009 10780520@unknown@formal@none@1@S@The first Latin-to-Spanish grammar ({{lang|es|''Gramática de la Lengua Castellana''}}) was written in [[Salamanca]], Spain, in 1492, by [[Antonio de Nebrija|Elio Antonio de Nebrija]].@@@@1@23@@danf@17-8-2009 10780530@unknown@formal@none@1@S@When it was presented to [[Isabel de Castilla]], she asked, "What do I want a work like this for, if I already know the language?", to which he replied, "Your highness, the language is the instrument of the Empire."@@@@1@39@@danf@17-8-2009 10780540@unknown@formal@none@1@S@From the 16th century onwards, the language was taken to the [[Americas]] and the [[Spanish East Indies]] via [[Spanish colonization of the Americas|Spanish colonization]].@@@@1@24@@danf@17-8-2009 10780550@unknown@formal@none@1@S@In the 20th century, Spanish was introduced to [[Equatorial Guinea]] and the [[Western Sahara]], the United States, such as in [[Spanish Harlem]], in [[New York City]], that had not been part of the Spanish Empire.@@@@1@35@@danf@17-8-2009 10780560@unknown@formal@none@1@S@For details on borrowed words and other external influences upon Spanish, see [[Influences on the Spanish language]].@@@@1@17@@danf@17-8-2009 10780570@unknown@formal@none@1@S@===Characterization===@@@@1@1@@danf@17-8-2009 10780580@unknown@formal@none@1@S@A defining characteristic of Spanish was the [[diphthong]]ization of the Latin short vowels ''e'' and ''o'' into ''ie'' and ''ue'', respectively, when they were stressed.@@@@1@25@@danf@17-8-2009 10780590@unknown@formal@none@1@S@Similar [[sound law|sound changes]] are found in other Romance languages, but in Spanish they were significant.@@@@1@16@@danf@17-8-2009 10780600@unknown@formal@none@1@S@Some examples:@@@@1@2@@danf@17-8-2009 10780610@unknown@formal@none@1@S@* Lat. {{lang|la|''petra''}} > Sp. {{lang|es|''piedra''}}, It. {{lang|it|''pietra''}}, Fr. {{lang|fr|''pierre''}}, Rom. {{lang|ro|''piatrǎ''}}, Port./Gal. {{lang|pt|''pedra''}} "stone".@@@@1@15@@danf@17-8-2009 10780620@unknown@formal@none@1@S@* Lat. {{lang|la|''moritur''}} > Sp. {{lang|es|''muere''}}, It. {{lang|it|''muore''}}, Fr. {{lang|fr|''meurt''}} / {{lang|fr|''muert''}}, Rom. {{lang|ro|''moare''}}, Port./Gal. {{lang|pt|''morre''}} "die".@@@@1@17@@danf@17-8-2009 10780630@unknown@formal@none@1@S@Peculiar to early Spanish (as in the [[Gascon]] dialect of Occitan, and possibly due to a Basque [[substratum]]) was the mutation of Latin initial ''f-'' into ''h-'' whenever it was followed by a vowel that did not diphthongate.@@@@1@38@@danf@17-8-2009 10780640@unknown@formal@none@1@S@Compare for instance:@@@@1@3@@danf@17-8-2009 10780650@unknown@formal@none@1@S@* Lat. {{lang|la|''filium''}} > It. {{lang|it|''figlio''}}, Port. {{lang|pt|''filho''}}, Gal. {{lang|gl|''fillo''}}, Fr. {{lang|fr|''fils''}}, Occitan {{lang|oc|''filh''}} (but Gascon {{lang|gsc|''hilh''}}) Sp. {{lang|es|''hijo''}} (but Ladino {{lang|lad|''fijo''}});@@@@1@22@@danf@17-8-2009 10780660@unknown@formal@none@1@S@* Lat. {{lang|la|''fabulari''}} > Lad. {{lang|lad|''favlar''}}, Port./Gal. {{lang|pt|''falar''}}, Sp. {{lang|es|''hablar''}};@@@@1@10@@danf@17-8-2009 10780670@unknown@formal@none@1@S@* but Lat. {{lang|la|''focum''}} > It. {{lang|it|''fuoco''}}, Port./Gal. {{lang|pt|''fogo''}}, Sp./Lad. {{lang|es|''fuego''}}.@@@@1@11@@danf@17-8-2009 10780680@unknown@formal@none@1@S@Some [[consonant cluster]]s of Latin also produced characteristically different results in these languages, for example:@@@@1@15@@danf@17-8-2009 10780690@unknown@formal@none@1@S@* Lat. {{lang|la|''clamare''}}, acc. {{lang|la|''flammam''}}, {{lang|la|''plenum''}} > Lad. {{lang|lad|''lyamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Sp. {{lang|es|''llamar''}}, {{lang|es|''llama''}}, {{lang|es|''lleno''}}.@@@@1@15@@danf@17-8-2009 10780700@unknown@formal@none@1@S@However, in Spanish there are also the forms {{lang|la|''clamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Port. {{lang|pt|''chamar''}}, {{lang|pt|''chama''}}, {{lang|pt|''cheio''}}; Gal. {{lang|gl|''chamar''}}, {{lang|gl|''chama''}}, {{lang|gl|''cheo''}}.@@@@1@19@@danf@17-8-2009 10780710@unknown@formal@none@1@S@* Lat. acc. {{lang|la|''octo''}}, {{lang|la|''noctem''}}, {{lang|la|''multum''}} > Lad. {{lang|lad|''ocho''}}, {{lang|lad|''noche''}}, {{lang|lad|''muncho''}}; Sp. {{lang|es|''ocho''}}, {{lang|es|''noche''}}, {{lang|es|''mucho''}}; Port. {{lang|pt|''oito''}}, {{lang|pt|''noite''}}, {{lang|pt|''muito''}}; Gal. {{lang|gl|''oito''}}, {{lang|gl|''noite''}}, {{lang|gl|''moito''}}.@@@@1@23@@danf@17-8-2009 10780720@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10780730@unknown@formal@none@1@S@Spanish is one of the official languages of the [[European Union]], the [[Organization of American States]], the [[Organization of Ibero-American States]], the [[United Nations]], and the [[Union of South American Nations]].@@@@1@31@@danf@17-8-2009 10780740@unknown@formal@none@1@S@===Europe===@@@@1@1@@danf@17-8-2009 10780750@unknown@formal@none@1@S@Spanish is an official language of Spain, the country for which it is named and from which it originated.@@@@1@19@@danf@17-8-2009 10780760@unknown@formal@none@1@S@It is also spoken in [[Gibraltar]], though English is the official language.@@@@1@12@@danf@17-8-2009 10780770@unknown@formal@none@1@S@Likewise, it is spoken in [[Andorra]] though [[Catalan language|Catalan]] is the official language.@@@@1@13@@danf@17-8-2009 10780780@unknown@formal@none@1@S@It is also spoken by small communities in other European countries, such as the [[United Kingdom]], [[France]], and [[Germany]].@@@@1@19@@danf@17-8-2009 10780790@unknown@formal@none@1@S@Spanish is an official language of the [[European Union]].@@@@1@9@@danf@17-8-2009 10780800@unknown@formal@none@1@S@In Switzerland, Spanish is the [[mother tongue]] of 1.7% of the population, representing the first minority after the 4 official languages of the country.@@@@1@24@@danf@17-8-2009 10780810@unknown@formal@none@1@S@===The Americas ===@@@@1@3@@danf@17-8-2009 10780820@unknown@formal@none@1@S@====Latin America====@@@@1@2@@danf@17-8-2009 10780830@unknown@formal@none@1@S@Most Spanish speakers are in [[Latin America]]; of most countries with the most Spanish speakers, only [[Spain]] is outside of the [[Americas]].@@@@1@22@@danf@17-8-2009 10780840@unknown@formal@none@1@S@[[Mexico]] has most of the world's native speakers.@@@@1@8@@danf@17-8-2009 10780850@unknown@formal@none@1@S@Nationally, Spanish is the official language of [[Argentina]], [[Bolivia]] (co-official [[Quechua]] and [[Aymara language|Aymara]]), [[Chile]], [[Colombia]], [[Costa Rica]], [[Cuba]], [[Dominican Republic]], [[Ecuador]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Mexico]] , [[Nicaragua]], [[Panama]], [[Paraguay]] (co-official [[Guarani language|Guaraní]]), [[Peru]] (co-official [[Quechua]] and, in some regions, [[Aymara language|Aymara]]), [[Uruguay]], and [[Venezuela]].@@@@1@46@@danf@17-8-2009 10780860@unknown@formal@none@1@S@Spanish is also the official language (co-official with [[English language|English]]) in the U.S. commonwealth of [[Puerto Rico]].@@@@1@17@@danf@17-8-2009 10780870@unknown@formal@none@1@S@Spanish has no official recognition in the former [[British overseas territories|British colony]] of [[Belize]]; however, per the 2000 census, it is spoken by 43% of the population.@@@@1@27@@danf@17-8-2009 10780880@unknown@formal@none@1@S@Mainly, it is spoken by Hispanic descendants who remained in the region since the 17th century; however, English is the official language.@@@@1@22@@danf@17-8-2009 10780890@unknown@formal@none@1@S@Spain colonized [[Trinidad and Tobago]] first in [[1498]], leaving the [[Carib]] people the Spanish language.@@@@1@15@@danf@17-8-2009 10780900@unknown@formal@none@1@S@Also the [[Cocoa Panyol]]s, laborers from Venezuela, took their culture and language with them; they are accredited with the music of "[[Parang]]" ("[[Parranda]]") on the island.@@@@1@26@@danf@17-8-2009 10780910@unknown@formal@none@1@S@Because of Trinidad's location on the South American coast, the country is much influenced by its Spanish-speaking neighbors.@@@@1@18@@danf@17-8-2009 10780920@unknown@formal@none@1@S@A recent census shows that more than 1,500 inhabitants speak Spanish.@@@@1@11@@danf@17-8-2009 10780930@unknown@formal@none@1@S@In 2004, the government launched the ''Spanish as a First Foreign Language'' (SAFFL) initiative in March 2005.@@@@1@17@@danf@17-8-2009 10780940@unknown@formal@none@1@S@Government regulations require Spanish to be taught, beginning in primary school, while thirty percent of public employees are to be linguistically competent within five years.@@@@1@25@@danf@17-8-2009 10780950@unknown@formal@none@1@S@The government also announced that Spanish will be the country's second official language by [[2020]], beside English.@@@@1@17@@danf@17-8-2009 10780960@unknown@formal@none@1@S@Spanish is important in [[Brazil]] because of its proximity to and increased trade with its Spanish-speaking neighbors; for example, as a member of the [[Mercosur]] trading bloc.@@@@1@27@@danf@17-8-2009 10780970@unknown@formal@none@1@S@In 2005, the [[National Congress of Brazil]] approved a bill, signed into law by the [[President of Brazil|President]], making Spanish available as a foreign language in secondary schools.@@@@1@28@@danf@17-8-2009 10780980@unknown@formal@none@1@S@In many border towns and villages (especially on the Uruguayan-Brazilian border), a [[mixed language]] known as [[Riverense Portuñol|Portuñol]] is spoken.@@@@1@20@@danf@17-8-2009 10780990@unknown@formal@none@1@S@====United States====@@@@1@2@@danf@17-8-2009 10781000@unknown@formal@none@1@S@In the 2006 census, 44.3 million people of the U.S. population were [[Hispanic]] or [[Latino]] by origin; 34 million people, 12.2 percent, of the population older than 5 years speak Spanish at home.@@@@1@33@@danf@17-8-2009 10781005@unknown@formal@none@1@S@Spanish has a [[Spanish in the United States|long history in the United States]] (many south-western states were part of Mexico and Spain), and it recently has been revitalized by much immigration from Latin America.@@@@1@34@@danf@17-8-2009 10781010@unknown@formal@none@1@S@Spanish is the most widely taught foreign language in the country.@@@@1@11@@danf@17-8-2009 10781020@unknown@formal@none@1@S@Although the United States has no formally designated "official languages," Spanish is formally recognized at the state level beside English; in the U.S. state of [[New Mexico]], 30 per cent of the population speak it.@@@@1@35@@danf@17-8-2009 10781030@unknown@formal@none@1@S@It also has strong influence in metropolitan areas such as Los Angeles, Miami and New York City.@@@@1@17@@danf@17-8-2009 10781040@unknown@formal@none@1@S@Spanish is the dominant spoken language in [[Puerto Rico]], a U.S. territory.@@@@1@12@@danf@17-8-2009 10781050@unknown@formal@none@1@S@In total, the U.S. has the world's fifth-largest Spanish-speaking population.@@@@1@10@@danf@17-8-2009 10781060@unknown@formal@none@1@S@===Asia===@@@@1@1@@danf@17-8-2009 10781070@unknown@formal@none@1@S@Spanish was an official language of the [[Philippines]] but was never spoken by a majority of the population.@@@@1@18@@danf@17-8-2009 10781080@unknown@formal@none@1@S@Movements for most of the masses to learn the language were started but were stopped by the friars.@@@@1@18@@danf@17-8-2009 10781090@unknown@formal@none@1@S@Its importance fell in the first half of the 20th century following the U.S. occupation and administration of the islands.@@@@1@20@@danf@17-8-2009 10781100@unknown@formal@none@1@S@The introduction of the English language in the Philippine government system put an end to the use of Spanish as the official language.@@@@1@23@@danf@17-8-2009 10781110@unknown@formal@none@1@S@The language lost its official status in 1973 during the [[Ferdinand Marcos]] administration.@@@@1@13@@danf@17-8-2009 10781120@unknown@formal@none@1@S@Spanish is spoken mainly by small communities of Filipino-born Spaniards, Latin Americans, and Filipino [[mestizo]]s (mixed race), descendants of the early colonial Spanish settlers.@@@@1@24@@danf@17-8-2009 10781130@unknown@formal@none@1@S@Throughout the 20th century, the Spanish language has declined in importance compared to English and [[Tagalog language|Tagalog]].@@@@1@17@@danf@17-8-2009 10781140@unknown@formal@none@1@S@According to the 1990 Philippine census, there were 2,658 native speakers of Spanish.@@@@1@13@@danf@17-8-2009 10781150@unknown@formal@none@1@S@No figures were provided during the 1995 and 2000 censuses; however, figures for 2000 did specify there were over 600,000 native speakers of [[Chavacano language|Chavacano]], a Spanish based [[Creole language|creole]] language spoken in [[Cavite]] and [[Zamboanga]].@@@@1@36@@danf@17-8-2009 10781160@unknown@formal@none@1@S@Some other sources put the number of Spanish speakers in the Philippines around two to three million; however, these sources are disputed.@@@@1@22@@danf@17-8-2009 10781170@unknown@formal@none@1@S@In Tagalog, there are 4,000 Spanish adopted words and around 6,000 Spanish adopted words in Visayan and other Philippine languages as well.@@@@1@22@@danf@17-8-2009 10781180@unknown@formal@none@1@S@Today Spanish is offered as a foreign language in Philippines schools and universities.@@@@1@13@@danf@17-8-2009 10781190@unknown@formal@none@1@S@===Africa===@@@@1@1@@danf@17-8-2009 10781200@unknown@formal@none@1@S@In Africa, Spanish is official in the UN-recognised but Moroccan-occupied [[Western Sahara]] (co-official [[Arabic language|Arabic]]) and [[Equatorial Guinea]] (co-official [[French language|French]] and [[Portuguese language|Portuguese]]).@@@@1@24@@danf@17-8-2009 10781210@unknown@formal@none@1@S@Today, nearly 200,000 refugee Sahrawis are able to read and write in Spanish, and several thousands have received [[university]] education in foreign countries as part of aid packages (mainly [[Cuba]] and [[Spain]]).@@@@1@32@@danf@17-8-2009 10781220@unknown@formal@none@1@S@In Equatorial Guinea, Spanish is the predominant language when counting native and non-native speakers (around 500,000 people), while [[Fang language|Fang]] is the most spoken language by a number of native speakers.@@@@1@31@@danf@17-8-2009 10781230@unknown@formal@none@1@S@It is also spoken in the Spanish cities in [[Plazas de soberanía|continental North Africa]] ([[Ceuta]] and [[Melilla]]) and in the autonomous community of [[Canary Islands]] (143,000 and 1,995,833 people, respectively).@@@@1@30@@danf@17-8-2009 10781240@unknown@formal@none@1@S@Within Northern Morocco, a former [[History of Morocco#European influence|Franco-Spanish protectorate]] that is also geographically close to Spain, approximately 20,000 people speak Spanish.@@@@1@22@@danf@17-8-2009 10781250@unknown@formal@none@1@S@It is spoken by some communities of [[Angola]], because of the Cuban influence from the [[Cold War]], and in [[Nigeria]] by the descendants of [[Afro-Cuban]] ex-slaves.@@@@1@26@@danf@17-8-2009 10781260@unknown@formal@none@1@S@In [[Côte d'Ivoire]] and [[Senegal]], Spanish can be learned as a second foreign language in the public education system.@@@@1@19@@danf@17-8-2009 10781270@unknown@formal@none@1@S@In 2008, [[Cervantes Institute]]s centers will be opened in [[Lagos]] and [[Johannesburg]], the first one in the [[Sub-Saharan Africa]]@@@@1@19@@danf@17-8-2009 10781280@unknown@formal@none@1@S@===Oceania===@@@@1@1@@danf@17-8-2009 10781290@unknown@formal@none@1@S@Among the countries and territories in [[Oceania]], Spanish is also spoken in [[Easter Island]], a territorial possession of Chile.@@@@1@19@@danf@17-8-2009 10781300@unknown@formal@none@1@S@According to the 2001 census, there are approximately 95,000 speakers of Spanish in Australia, 44,000 of which live in Greater Sydney , where the older [[:Category: Australians of Mexican descent|Mexican]], [[:Category:Australians of Colombian descent|Colombian]], and [[:Category: Australians of Spanish descent|Spanish]] populations and newer [[:Category:Australians of Argentine descent|Argentine]], Salvadoran and [[:Category:Australians of Uruguayan descent|Uruguyan]] communities live.@@@@1@55@@danf@17-8-2009 10781310@unknown@formal@none@1@S@The island nations of [[Guam]], [[Palau]], [[Northern Marianas]], [[Marshall Islands]] and [[Federated States of Micronesia]] all once had Spanish speakers, since [[Marianas Islands|Marianas]] and [[Caroline Islands]] were Spanish colonial possessions until late 19th century (see [[Spanish-American War]]), but Spanish has since been forgotten.@@@@1@43@@danf@17-8-2009 10781320@unknown@formal@none@1@S@It now only exists as an influence on the local native languages and also spoken by [[Hispanics in the United States|Hispanic American]] resident populations.@@@@1@24@@danf@17-8-2009 10781330@unknown@formal@none@1@S@==Dialectal variation==@@@@1@2@@danf@17-8-2009 10781340@unknown@formal@none@1@S@There are important variations among the regions of Spain and throughout Spanish-speaking America.@@@@1@13@@danf@17-8-2009 10781350@unknown@formal@none@1@S@In countries in Hispanophone America, it is preferable to use the word ''castellano'' to distinguish their version of the language from that of Spain, thus asserting their autonomy and national identity.@@@@1@31@@danf@17-8-2009 10781360@unknown@formal@none@1@S@In Spain the Castilian dialect's pronunciation is commonly regarded as the national standard, although a use of slightly different pronouns called [[Loísmo|{{lang|es|''laísmo''}}]] of this dialect is deprecated.@@@@1@27@@danf@17-8-2009 10781370@unknown@formal@none@1@S@More accurately, for nearly everyone in Spain, "standard Spanish" means "pronouncing everything exactly as it is written," an ideal which does not correspond to any real dialect, though the northern dialects are the closest to it.@@@@1@36@@danf@17-8-2009 10781380@unknown@formal@none@1@S@In practice, the standard way of speaking Spanish in the media is "written Spanish" for formal speech, "Madrid dialect" (one of the transitional variants between Castilian and Andalusian) for informal speech.@@@@1@31@@danf@17-8-2009 10781390@unknown@formal@none@1@S@===Voseo===@@@@1@1@@danf@17-8-2009 10781400@unknown@formal@none@1@S@Spanish has three [[grammatical person|second-person]] [[grammatical number|singular]] [[pronoun]]s: {{lang|es|''tú''}}, {{lang|es|''usted''}}, and in some parts of Latin America, {{lang|es|''vos''}} (the use of this pronoun and/or its verb forms is called ''voseo'').@@@@1@30@@danf@17-8-2009 10781410@unknown@formal@none@1@S@In those regions where it is used, generally speaking, {{lang|es|''tú''}} and {{lang|es|''vos''}} are informal and used with friends; in other countries, {{lang|es|''vos''}} is considered an archaic form.@@@@1@27@@danf@17-8-2009 10781415@unknown@formal@none@1@S@{{lang|es|''Usted''}} is universally regarded as the formal address (derived from {{lang|es|''vuestra merced''}}, "your grace"), and is used as a mark of respect, as when addressing one's elders or strangers.@@@@1@29@@danf@17-8-2009 10781420@unknown@formal@none@1@S@{{lang|es|''Vos''}} is used extensively as the primary spoken form of the second-person singular pronoun, although with wide differences in social consideration, in many countries of [[Latin America]], including [[Argentina]], [[Chile]], [[Costa Rica]], the central mountain region of [[Ecuador]], the State of [[Chiapas]] in [[Mexico]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Nicaragua]], [[Paraguay]], [[Uruguay]], the [[Paisa region]] and Caleños of [[Colombia]] and the [[States]] of [[Zulia]] and Trujillo in [[Venezuela]].@@@@1@67@@danf@17-8-2009 10781430@unknown@formal@none@1@S@There are some differences in the verbal endings for ''vos'' in each country.@@@@1@13@@danf@17-8-2009 10781440@unknown@formal@none@1@S@In Argentina, Uruguay, and increasingly in Paraguay and some Central American countries, it is also the standard form used in the [[mass media|media]], but the media in other countries with {{lang|es|''voseo''}} generally continue to use {{lang|es|''usted''}} or {{lang|es|''tú''}} except in advertisements, for instance.@@@@1@43@@danf@17-8-2009 10781445@unknown@formal@none@1@S@{{lang|es|''Vos''}} may also be used regionally in other countries.@@@@1@9@@danf@17-8-2009 10781450@unknown@formal@none@1@S@Depending on country or region, usage may be considered standard or (by better educated speakers) to be unrefined.@@@@1@18@@danf@17-8-2009 10781460@unknown@formal@none@1@S@Interpersonal situations in which the use of ''vos'' is acceptable may also differ considerably between regions.@@@@1@16@@danf@17-8-2009 10781470@unknown@formal@none@1@S@===Ustedes===@@@@1@1@@danf@17-8-2009 10781480@unknown@formal@none@1@S@Spanish forms also differ regarding second-person plural pronouns.@@@@1@8@@danf@17-8-2009 10781490@unknown@formal@none@1@S@The Spanish dialects of Latin America have only one form of the second-person plural for daily use, {{lang|es|''ustedes''}} (formal or familiar, as the case may be, though {{lang|es|''vosotros''}} non-formal usage can sometimes appear in poetry and rhetorical or literary style).@@@@1@40@@danf@17-8-2009 10781500@unknown@formal@none@1@S@In Spain there are two forms — {{lang|es|''ustedes''}} (formal) and {{lang|es|''vosotros''}} (familiar).@@@@1@12@@danf@17-8-2009 10781510@unknown@formal@none@1@S@The pronoun {{lang|es|''vosotros''}} is the plural form of {{lang|es|''tú''}} in most of Spain, but in the Americas (and certain southern Spanish cities such as [[Cádiz]] or [[Seville]], and in the [[Canary Islands]]) it is replaced with {{lang|es|''ustedes''}}.@@@@1@37@@danf@17-8-2009 10781520@unknown@formal@none@1@S@It is notable that the use of {{lang|es|''ustedes''}} for the informal plural "you" in southern Spain does not follow the usual rule for pronoun-verb [[agreement (linguistics)|agreement]]; e.g., while the formal form for "you go", {{lang|es|''ustedes van''}}, uses the third-person plural form of the verb, in Cádiz or Seville the informal form is constructed as {{lang|es|''ustedes vais''}}, using the second-person plural of the verb.@@@@1@63@@danf@17-8-2009 10781530@unknown@formal@none@1@S@In the Canary Islands, though, the usual pronoun-verb agreement is preserved in most cases.@@@@1@14@@danf@17-8-2009 10781540@unknown@formal@none@1@S@Some words can be different, even embarrassingly so, in different Hispanophone countries.@@@@1@12@@danf@17-8-2009 10781550@unknown@formal@none@1@S@Most Spanish speakers can recognize other Spanish forms, even in places where they are not commonly used, but Spaniards generally do not recognise specifically American usages.@@@@1@26@@danf@17-8-2009 10781560@unknown@formal@none@1@S@For example, Spanish ''mantequilla'', ''aguacate'' and ''albaricoque'' (respectively, "butter", "avocado", "apricot") correspond to ''manteca'', ''palta'', and ''damasco'', respectively, in Argentina, Chile and Uruguay.@@@@1@23@@danf@17-8-2009 10781570@unknown@formal@none@1@S@The everyday Spanish words ''coger'' (to catch, get, or pick up), ''pisar'' (to step on) and ''concha'' (seashell) are considered extremely rude in parts of Latin America, where the meaning of ''coger'' and ''pisar'' is also "to have sex" and ''concha'' means "vulva".@@@@1@43@@danf@17-8-2009 10781580@unknown@formal@none@1@S@The Puerto Rican word for "bobby pin" (''pinche'') is an obscenity in Mexico, and in [[Nicaragua]] simply means "stingy".@@@@1@19@@danf@17-8-2009 10781590@unknown@formal@none@1@S@Other examples include ''[[taco]]'', which means "swearword" in Spain but is known to the rest of the world as a Mexican dish.@@@@1@22@@danf@17-8-2009 10781600@unknown@formal@none@1@S@''Pija'' in many countries of Latin America is an obscene slang word for "penis", while in [[Spain]] the word also signifies "posh girl" or "snobby".@@@@1@25@@danf@17-8-2009 10781610@unknown@formal@none@1@S@''Coche'', which means "car" in Spain, for the vast majority of Spanish-speakers actually means "baby-stroller", in Guatemala it means "pig", while ''carro'' means "car" in some Latin American countries and "cart" in others, as well as in Spain.@@@@1@38@@danf@17-8-2009 10781620@unknown@formal@none@1@S@The {{lang|es|[[Real Academia Española]]}} (Royal Spanish Academy), together with the 21 other national ones (see [[Association of Spanish Language Academies]]), exercises a standardizing influence through its publication of dictionaries and widely respected grammar and style guides.@@@@1@36@@danf@17-8-2009 10781630@unknown@formal@none@1@S@Due to this influence and for other sociohistorical reasons, a standardized form of the language ([[Standard Spanish]]) is widely acknowledged for use in literature, academic contexts and the media.@@@@1@29@@danf@17-8-2009 10781640@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10781650@unknown@formal@none@1@S@Spanish is written using the [[Latin alphabet]], with the addition of the character ''[[ñ]]'' (''eñe'', representing the phoneme {{IPA|/ɲ/}}, a letter distinct from ''n'', although typographically composed of an ''n'' with a [[tilde]]) and the [[digraph (orthography)|digraph]]s ''ch'' ({{lang|es|''che''}}, representing the phoneme {{IPA|/tʃ/}}) and ''ll'' ({{lang|es|''elle''}}, representing the phoneme {{IPA|/ʎ/}}).@@@@1@50@@danf@17-8-2009 10781660@unknown@formal@none@1@S@However, the digraph ''rr'' ({{lang|es|''erre fuerte''}}, "strong ''r''", {{lang|es|''erre doble''}}, "double ''r''", or simply {{lang|es|''erre''}}), which also represents a distinct phoneme {{IPA|/r/}}, is not similarly regarded as a single letter.@@@@1@30@@danf@17-8-2009 10781670@unknown@formal@none@1@S@Since 1994, the digraphs ''ch'' and ''ll'' are to be treated as letter pairs for [[collation]] purposes, though they remain a part of the alphabet.@@@@1@25@@danf@17-8-2009 10781680@unknown@formal@none@1@S@Words with ''ch'' are now alphabetically sorted between those with ''ce'' and ''ci'', instead of following ''cz'' as they used to, and similarly for ''ll''.@@@@1@25@@danf@17-8-2009 10781690@unknown@formal@none@1@S@Thus, the Spanish alphabet has the following 29 letters:@@@@1@9@@danf@17-8-2009 10781700@unknown@formal@none@1@S@:a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z.@@@@1@29@@danf@17-8-2009 10781710@unknown@formal@none@1@S@With the exclusion of a very small number of regional terms such as ''México'' (see [[Toponymy of Mexico]]) and some neologisms like ''software'', pronunciation can be entirely determined from spelling.@@@@1@30@@danf@17-8-2009 10781720@unknown@formal@none@1@S@A typical Spanish word is stressed on the [[syllable]] before the last if it ends with a vowel (not including ''y'') or with a vowel followed by ''n'' or ''s''; it is stressed on the last syllable otherwise.@@@@1@38@@danf@17-8-2009 10781730@unknown@formal@none@1@S@Exceptions to this rule are indicated by placing an [[acute accent]] on the [[stress (linguistics)|stressed vowel]].@@@@1@16@@danf@17-8-2009 10781740@unknown@formal@none@1@S@The acute accent is used, in addition, to distinguish between certain [[homophone]]s, especially when one of them is a stressed word and the other one is a [[clitic]]: compare {{lang|es|''el''}} ("the", masculine singular definite article) with {{lang|es|''él''}} ("he" or "it"), or {{lang|es|''te''}} ("you", object pronoun), {{lang|es|''de''}} (preposition "of" or "from"), and {{lang|es|''se''}} (reflexive pronoun) with {{lang|es|''té''}} ("tea"), {{lang|es|''dé''}} ("give") and {{lang|es|''sé''}} ("I know", or imperative "be").@@@@1@66@@danf@17-8-2009 10781750@unknown@formal@none@1@S@The interrogative pronouns ({{lang|es|''qué''}}, {{lang|es|''cuál''}}, {{lang|es|''dónde''}}, {{lang|es|''quién''}}, etc.) also receive accents in direct or indirect questions, and some demonstratives ({{lang|es|''ése''}}, {{lang|es|''éste''}}, {{lang|es|''aquél''}}, etc.) must be accented when used as pronouns.@@@@1@30@@danf@17-8-2009 10781760@unknown@formal@none@1@S@The conjunction {{lang|es|''o''}} ("or") is written with an accent between numerals so as not to be confused with a zero: e.g., {{lang|es|''10 ó 20''}} should be read as {{lang|es|''diez o veinte''}} rather than {{lang|es|''diez mil veinte''}} ("10,020").@@@@1@37@@danf@17-8-2009 10781770@unknown@formal@none@1@S@Accent marks are frequently omitted in capital letters (a widespread practice in the early days of computers where only lowercase vowels were available with accents), although the [[Real Academia Española|RAE]] advises against this.@@@@1@33@@danf@17-8-2009 10781780@unknown@formal@none@1@S@When ''u'' is written between ''g'' and a front vowel (''e'' or ''i''), if it should be pronounced, it is written with a [[diaeresis (diacritic)|diaeresis]] (''ü'') to indicate that it is not silent as it normally would be (e.g., ''cigüeña'', "stork", is pronounced {{IPA|/θiˈɣweɲa/}}; if it were written ''cigueña'', it would be pronounced {{IPA|/θiˈɣeɲa/}}.@@@@1@54@@danf@17-8-2009 10781790@unknown@formal@none@1@S@Interrogative and exclamatory clauses are introduced with [[Inverted question and exclamation marks|inverted question ( ¿ ) and exclamation ( ¡ ) marks]].@@@@1@22@@danf@17-8-2009 10781800@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10781810@unknown@formal@none@1@S@The phonemic inventory listed in the following table includes [[phoneme]]s that are preserved only in some dialects, other dialects having merged them (such as ''[[yeísmo]]''); these are marked with an asterisk (*).@@@@1@32@@danf@17-8-2009 10781820@unknown@formal@none@1@S@Sounds in parentheses are [[allophone]]s.@@@@1@5@@danf@17-8-2009 10781830@unknown@formal@none@1@S@By the 16th century, the consonant system of Spanish underwent the following important changes that differentiated it from [[Iberian Romance languages|neighboring Romance languages]] such as [[Portuguese language|Portuguese]] and [[Catalan language|Catalan]]:@@@@1@30@@danf@17-8-2009 10781840@unknown@formal@none@1@S@*Initial {{IPA|/f/}}, when it had evolved into a vacillating {{IPA|/h/}}, was lost in most words (although this etymological ''h-'' is preserved in spelling and in some Andalusian dialects is still aspirated).@@@@1@31@@danf@17-8-2009 10781850@unknown@formal@none@1@S@*The [[bilabial approximant]] {{IPA|/β̞/}} (which was written ''u'' or ''v'') merged with the bilabial oclusive {{IPA|/b/}} (written ''b'').@@@@1@18@@danf@17-8-2009 10781860@unknown@formal@none@1@S@There is no difference between the pronunciation of orthographic ''b'' and ''v'' in contemporary Spanish, excepting emphatic pronunciations that cannot be considered standard or natural.@@@@1@25@@danf@17-8-2009 10781870@unknown@formal@none@1@S@*The [[voiced alveolar fricative]] {{IPA|/z/}} which existed as a separate phoneme in medieval Spanish merged with its voiceless counterpart {{IPA|/s/}}.@@@@1@20@@danf@17-8-2009 10781880@unknown@formal@none@1@S@The phoneme which resulted from this merger is currently spelled ''s''.@@@@1@11@@danf@17-8-2009 10781890@unknown@formal@none@1@S@*The [[voiced postalveolar fricative]] {{IPA|/ʒ/}} merged with its voiceless counterpart {{IPA|/ʃ/}}, which evolved into the modern velar sound {{IPA|/x/}} by the 17th century, now written with ''j'', or ''g'' before ''e, i''.@@@@1@32@@danf@17-8-2009 10781900@unknown@formal@none@1@S@Nevertheless, in most parts of Argentina and in Uruguay, ''y'' and ''ll'' have both evolved to {{IPA|/ʒ/}} or {{IPA|/ʃ/}}.@@@@1@19@@danf@17-8-2009 10781910@unknown@formal@none@1@S@*The [[voiced alveolar affricate]] {{IPA|/dz/}} merged with its voiceless counterpart {{IPA|/ts/}}, which then developed into the interdental {{IPA|/θ/}}, now written ''z'', or ''c'' before ''e, i''.@@@@1@26@@danf@17-8-2009 10781920@unknown@formal@none@1@S@But in [[Andalusia]], the [[Canary Islands]] and the Americas this sound merged with {{IPA|/s/}} as well.@@@@1@16@@danf@17-8-2009 10781930@unknown@formal@none@1@S@See ''[[Ceceo]]'', for further information.@@@@1@5@@danf@17-8-2009 10781940@unknown@formal@none@1@S@The consonant system of Medieval Spanish has been better preserved in [[Ladino language|Ladino]] and in Portuguese, neither of which underwent these shifts.@@@@1@22@@danf@17-8-2009 10781950@unknown@formal@none@1@S@===Lexical stress===@@@@1@2@@danf@17-8-2009 10781960@unknown@formal@none@1@S@Spanish is a [[syllable-timed language]], so each syllable has the same duration regardless of stress.@@@@1@15@@danf@17-8-2009 10781970@unknown@formal@none@1@S@Stress most often occurs on any of the last three syllables of a word, with some rare exceptions at the fourth last.@@@@1@22@@danf@17-8-2009 10781980@unknown@formal@none@1@S@The ''tendencies'' of stress assignment are as follows:@@@@1@8@@danf@17-8-2009 10781990@unknown@formal@none@1@S@* In words ending in vowels and {{IPA|/s/}}, stress most often falls on the penultimate syllable.@@@@1@16@@danf@17-8-2009 10782000@unknown@formal@none@1@S@* In words ending in all other consonants, the stress more often falls on the ultimate syllable.@@@@1@17@@danf@17-8-2009 10782010@unknown@formal@none@1@S@* Preantepenultimate stress occurs rarely and only in words like ''guardándoselos'' ('saving them for him/her') where a clitic follows certain verbal forms.@@@@1@22@@danf@17-8-2009 10782020@unknown@formal@none@1@S@In addition to the many exceptions to these tendencies, there are numerous [[minimal pair]]s which contrast solely on stress.@@@@1@19@@danf@17-8-2009 10782030@unknown@formal@none@1@S@For example, ''sabana'', with penultimate stress, means 'savannah' while ''{{lang|es|sábana}}'', with antepenultimate stress, means 'sheet'; ''{{lang|es|límite}}'' ('boundary'), ''{{lang|es|limite}}'' ('[that] he/she limits') and ''{{lang|es|limité}}'' ('I limited') also contrast solely on stress.@@@@1@30@@danf@17-8-2009 10782040@unknown@formal@none@1@S@Phonological stress may be marked orthographically with an [[acute accent]] (''ácido'', ''distinción'', etc).@@@@1@13@@danf@17-8-2009 10782050@unknown@formal@none@1@S@This is done according to the mandatory stress rules of [[Spanish orthography]] which are similar to the tendencies above (differing with words like ''distinción'') and are defined so as to unequivocally indicate where the stress lies in a given written word.@@@@1@41@@danf@17-8-2009 10782060@unknown@formal@none@1@S@An acute accent may also be used to differentiate homophones (such as ''[[wikt:té#Spanish|té]]'' for 'tea' and ''[[wikt:te#Spanish|te]]''@@@@1@17@@danf@17-8-2009 10782070@unknown@formal@none@1@S@An amusing example of the significance of intonation in Spanish is the phrase ''{{lang|es|¿Cómo "cómo como"?@@@@1@16@@danf@17-8-2009 10782080@unknown@formal@none@1@S@¡Como como como!}}''@@@@1@3@@danf@17-8-2009 10782090@unknown@formal@none@1@S@("What do you mean / 'how / do I eat'? / I eat / the way / I eat!").@@@@1@19@@danf@17-8-2009 10782100@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10782110@unknown@formal@none@1@S@Spanish is a relatively [[inflected]] language, with a two-[[Grammatical gender|gender]] system and about fifty [[Grammatical conjugation|conjugated]] forms per [[verb]], but limited inflection of [[noun]]s, [[adjective]]s, and [[determiner]]s.@@@@1@27@@danf@17-8-2009 10782120@unknown@formal@none@1@S@(For a detailed overview of verbs, see [[Spanish verbs]] and [[Spanish irregular verbs]].)@@@@1@13@@danf@17-8-2009 10782130@unknown@formal@none@1@S@It is [[Branching (linguistics)|right-branching]], uses [[preposition]]s, and usually, though not always, places [[adjective]]s after [[noun]]s.@@@@1@15@@danf@17-8-2009 10782140@unknown@formal@none@1@S@Its [[syntax]] is generally [[Subject Verb Object]], though variations are common.@@@@1@11@@danf@17-8-2009 10782150@unknown@formal@none@1@S@It is a [[pro-drop language]] (allows the deletion of pronouns when pragmatically unnecessary) and [[verb framing|verb-framed]].@@@@1@16@@danf@17-8-2009 10782160@unknown@formal@none@1@S@== Samples ==@@@@1@3@@danf@17-8-2009 10790010@unknown@formal@none@1@S@
Speech recognition
@@@@1@2@@danf@17-8-2009 10790020@unknown@formal@none@1@S@'''Speech recognition''' (also known as '''automatic speech recognition''' or '''computer speech recognition''') converts spoken words to machine-readable input (for example, to keypresses, using the binary code for a string of [[Character (computing)|character]] codes).@@@@1@33@@danf@17-8-2009 10790030@unknown@formal@none@1@S@The term [[speaker recognition|voice recognition]] may also be used to refer to speech recognition, but more precisely refers to '''speaker recognition''', which attempts to identify the person speaking, as opposed to what is being said.@@@@1@35@@danf@17-8-2009 10790040@unknown@formal@none@1@S@Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), [[domotic]] appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., [[word processor]]s or [[email]]s), and in aircraft [[cockpit]]s (usually termed [[Direct Voice Input]]).@@@@1@70@@danf@17-8-2009 10790050@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10790060@unknown@formal@none@1@S@One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the [[medical transcription]]ist (MT).@@@@1@31@@danf@17-8-2009 10790070@unknown@formal@none@1@S@According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted.@@@@1@32@@danf@17-8-2009 10790080@unknown@formal@none@1@S@It was also the case that SR at that time was often technically deficient.@@@@1@14@@danf@17-8-2009 10790090@unknown@formal@none@1@S@Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do.@@@@1@26@@danf@17-8-2009 10790100@unknown@formal@none@1@S@The biggest limitation to speech recognition automating transcription, however, is seen as the software.@@@@1@14@@danf@17-8-2009 10790110@unknown@formal@none@1@S@The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system.@@@@1@27@@danf@17-8-2009 10790120@unknown@formal@none@1@S@Another limitation has been the extensive amount of time required by the user and/or system provider to train the software.@@@@1@20@@danf@17-8-2009 10790130@unknown@formal@none@1@S@A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific.@@@@1@23@@danf@17-8-2009 10790140@unknown@formal@none@1@S@Each of these types of application presents its own particular goals and challenges.@@@@1@13@@danf@17-8-2009 10790150@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10790160@unknown@formal@none@1@S@===Health care===@@@@1@2@@danf@17-8-2009 10790170@unknown@formal@none@1@S@In the [[health care]] domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete.@@@@1@22@@danf@17-8-2009 10790180@unknown@formal@none@1@S@Many experts in the field anticipate that with increased use of speech recognition technology, the services provided may be redistributed rather than replaced.@@@@1@23@@danf@17-8-2009 10790190@unknown@formal@none@1@S@Speech recognition can be implemented in front-end or back-end of the medical documentation process.@@@@1@14@@danf@17-8-2009 10790200@unknown@formal@none@1@S@Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document.@@@@1@34@@danf@17-8-2009 10790210@unknown@formal@none@1@S@It never goes through an MT/editor.@@@@1@6@@danf@17-8-2009 10790220@unknown@formal@none@1@S@Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report.@@@@1@48@@danf@17-8-2009 10790230@unknown@formal@none@1@S@Deferred SR is being widely used in the industry currently.@@@@1@10@@danf@17-8-2009 10790240@unknown@formal@none@1@S@Many [[Electronic Medical Records]] (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speech-recognition engine.@@@@1@24@@danf@17-8-2009 10790250@unknown@formal@none@1@S@Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard.@@@@1@18@@danf@17-8-2009 10790260@unknown@formal@none@1@S@****************************************************************************************@@@@1@1@@danf@17-8-2009 10790270@unknown@formal@none@1@S@**********************************@@@@1@1@@danf@17-8-2009 10790280@unknown@formal@none@1@S@*****************@@@@1@1@@danf@17-8-2009 10790290@unknown@formal@none@1@S@===Military===@@@@1@1@@danf@17-8-2009 10790300@unknown@formal@none@1@S@====High-performance fighter aircraft====@@@@1@3@@danf@17-8-2009 10790310@unknown@formal@none@1@S@Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft.@@@@1@20@@danf@17-8-2009 10790320@unknown@formal@none@1@S@Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/[[F-16]] aircraft ([[F-16 VISTA]]), the program in France on installing speech recognition systems on [[Mirage (aircraft)|Mirage]] aircraft, and programs in the UK dealing with a variety of aircraft platforms.@@@@1@45@@danf@17-8-2009 10790330@unknown@formal@none@1@S@In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays.@@@@1@33@@danf@17-8-2009 10790340@unknown@formal@none@1@S@Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system.@@@@1@27@@danf@17-8-2009 10790350@unknown@formal@none@1@S@Some important conclusions from the work were as follows:@@@@1@9@@danf@17-8-2009 10790360@unknown@formal@none@1@S@#Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.@@@@1@16@@danf@17-8-2009 10790370@unknown@formal@none@1@S@#Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system.@@@@1@31@@danf@17-8-2009 10790380@unknown@formal@none@1@S@#More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.@@@@1@22@@danf@17-8-2009 10790390@unknown@formal@none@1@S@Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft.@@@@1@29@@danf@17-8-2009 10790400@unknown@formal@none@1@S@Working with Swedish pilots flying in the [[JAS-39]] Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads.@@@@1@18@@danf@17-8-2009 10790410@unknown@formal@none@1@S@It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly.@@@@1@25@@danf@17-8-2009 10790420@unknown@formal@none@1@S@Contrary to what might be expected, no effects of the broken English of the speakers were found.@@@@1@17@@danf@17-8-2009 10790430@unknown@formal@none@1@S@It was evident that spontaneous speech caused problems for the recognizer, as could be expected.@@@@1@15@@danf@17-8-2009 10790440@unknown@formal@none@1@S@A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially.@@@@1@18@@danf@17-8-2009 10790450@unknown@formal@none@1@S@The [[Eurofighter Typhoon]] currently in service with the UK [[RAF]] employs a speaker-dependent system, i.e. it requires each pilot to create a template.@@@@1@23@@danf@17-8-2009 10790460@unknown@formal@none@1@S@The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other [[cockpit]] functions.@@@@1@33@@danf@17-8-2009 10790470@unknown@formal@none@1@S@Voice commands are confirmed by visual and/or aural feedback.@@@@1@9@@danf@17-8-2009 10790480@unknown@formal@none@1@S@The system is seen as a major design feature in the reduction of pilot [[workload]], and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands.@@@@1@40@@danf@17-8-2009 10790490@unknown@formal@none@1@S@====Helicopters====@@@@1@1@@danf@17-8-2009 10790500@unknown@formal@none@1@S@The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment.@@@@1@24@@danf@17-8-2009 10790510@unknown@formal@none@1@S@The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone.@@@@1@40@@danf@17-8-2009 10790520@unknown@formal@none@1@S@Substantial test and evaluation programs have been carried out in the post decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK.@@@@1@41@@danf@17-8-2009 10790530@unknown@formal@none@1@S@Work in France has included speech recognition in the Puma helicopter.@@@@1@11@@danf@17-8-2009 10790540@unknown@formal@none@1@S@There has also been much useful work in Canada.@@@@1@9@@danf@17-8-2009 10790550@unknown@formal@none@1@S@Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system.@@@@1@25@@danf@17-8-2009 10790560@unknown@formal@none@1@S@As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness.@@@@1@17@@danf@17-8-2009 10790570@unknown@formal@none@1@S@Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment.@@@@1@19@@danf@17-8-2009 10790580@unknown@formal@none@1@S@Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.@@@@1@25@@danf@17-8-2009 10790590@unknown@formal@none@1@S@====Battle management====@@@@1@2@@danf@17-8-2009 10790600@unknown@formal@none@1@S@Battle management command centres generally require rapid access to and control of large, rapidly changing information databases.@@@@1@17@@danf@17-8-2009 10790610@unknown@formal@none@1@S@Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format.@@@@1@28@@danf@17-8-2009 10790620@unknown@formal@none@1@S@Human machine interaction by voice has the potential to be very useful in these environments.@@@@1@15@@danf@17-8-2009 10790630@unknown@formal@none@1@S@A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments.@@@@1@17@@danf@17-8-2009 10790640@unknown@formal@none@1@S@In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications.@@@@1@21@@danf@17-8-2009 10790650@unknown@formal@none@1@S@Users were very optimistic about the potential of the system, although capabilities were limited.@@@@1@14@@danf@17-8-2009 10790660@unknown@formal@none@1@S@Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface..@@@@1@24@@danf@17-8-2009 10790670@unknown@formal@none@1@S@Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task.@@@@1@27@@danf@17-8-2009 10790680@unknown@formal@none@1@S@Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system.@@@@1@34@@danf@17-8-2009 10790690@unknown@formal@none@1@S@====Training air traffic controllers====@@@@1@4@@danf@17-8-2009 10790700@unknown@formal@none@1@S@Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems.@@@@1@17@@danf@17-8-2009 10790710@unknown@formal@none@1@S@Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation.@@@@1@40@@danf@17-8-2009 10790720@unknown@formal@none@1@S@Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel.@@@@1@25@@danf@17-8-2009 10790730@unknown@formal@none@1@S@Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task.@@@@1@26@@danf@17-8-2009 10790740@unknown@formal@none@1@S@The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition.@@@@1@19@@danf@17-8-2009 10790750@unknown@formal@none@1@S@Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system.@@@@1@16@@danf@17-8-2009 10790760@unknown@formal@none@1@S@However, the prototype training systems have demonstrated a significant potential for voice interaction in these systems, and in other training applications.@@@@1@21@@danf@17-8-2009 10790770@unknown@formal@none@1@S@The U.S. Navy has sponsored a large-scale effort in ATC training systems, where a commercial speech recognition unit was integrated with a complex training system including displays and scenario creation.@@@@1@30@@danf@17-8-2009 10790780@unknown@formal@none@1@S@Although the recognizer was constrained in vocabulary, one of the goals of the training programs was to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task.@@@@1@35@@danf@17-8-2009 10790790@unknown@formal@none@1@S@Research in France has focussed on the application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task-domain grammar constraints.@@@@1@29@@danf@17-8-2009 10790800@unknown@formal@none@1@S@The USAF, USMC, US Army, and FAA are currently using ATC simulators with speech recognition provided by Adacel Systems Inc (ASI).@@@@1@21@@danf@17-8-2009 10790810@unknown@formal@none@1@S@Adacel's MaxSim software uses speech recognition and synthetic speech to enable the trainee to control aircraft and ground vehicles in the simulation without the need for pseudo pilots.@@@@1@28@@danf@17-8-2009 10790820@unknown@formal@none@1@S@Adacel's ATC In A Box Software provideds a synthetic ATC environment for flight simulators.@@@@1@14@@danf@17-8-2009 10790830@unknown@formal@none@1@S@The "real" pilot talks to a virtual controller using speech recognition and the virtual controller responds with synthetic speech.@@@@1@19@@danf@17-8-2009 10790840@unknown@formal@none@1@S@It will be an application format@@@@1@6@@danf@17-8-2009 10790850@unknown@formal@none@1@S@===Telephony and other domains===@@@@1@4@@danf@17-8-2009 10790860@unknown@formal@none@1@S@ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread.@@@@1@22@@danf@17-8-2009 10790870@unknown@formal@none@1@S@Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use.@@@@1@29@@danf@17-8-2009 10790880@unknown@formal@none@1@S@The improvement of mobile processor speeds let create speech-enabled Symbian and Windows Mobile Smartphones.@@@@1@14@@danf@17-8-2009 10790890@unknown@formal@none@1@S@Current speech-to-text programs are too large and require too much CPU power to be practical for the Pocket PC.@@@@1@19@@danf@17-8-2009 10790900@unknown@formal@none@1@S@Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands.@@@@1@17@@danf@17-8-2009 10790910@unknown@formal@none@1@S@Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command); Nuance Communications (Nuance Voice Control); Vito Technology (VITO Voice2Go); Speereo Software (Speereo Voice Translator).@@@@1@26@@danf@17-8-2009 10790920@unknown@formal@none@1@S@===People with Disabilities===@@@@1@3@@danf@17-8-2009 10790930@unknown@formal@none@1@S@People with disabilities are another part of the population that benefit from using speech recognition programs.@@@@1@16@@danf@17-8-2009 10790940@unknown@formal@none@1@S@It is especially useful for people who have difficulty with or are unable to use their hands, from mild repetitive stress injuries to involved disabilities that require alternative input for support with accessing the computer.@@@@1@35@@danf@17-8-2009 10790950@unknown@formal@none@1@S@In fact, people who used the keyboard a lot and developed [[Repetitive Strain Injury|RSI]] became an urgent early market for speech recognition.@@@@1@22@@danf@17-8-2009 10790960@unknown@formal@none@1@S@Speech recognition is used in [[deaf]] [[telephony]], such as [[spinvox]] voice-to-text voicemail, [[relay services]], and [[Telecommunications Relay Service#Captioned_telephone|captioned telephone]].@@@@1@19@@danf@17-8-2009 10790970@unknown@formal@none@1@S@===Further applications===@@@@1@2@@danf@17-8-2009 10790980@unknown@formal@none@1@S@*Automatic translation@@@@1@2@@danf@17-8-2009 10790990@unknown@formal@none@1@S@*Automotive speech recognition (e.g., [[Ford Sync]])@@@@1@6@@danf@17-8-2009 10791000@unknown@formal@none@1@S@*Telematics (e.g. vehicle Navigation Systems)@@@@1@5@@danf@17-8-2009 10791010@unknown@formal@none@1@S@*Court reporting (Realtime Voice Writing)@@@@1@5@@danf@17-8-2009 10791020@unknown@formal@none@1@S@*[[Hands-free computing]]: voice command recognition computer [[user interface]]@@@@1@8@@danf@17-8-2009 10791030@unknown@formal@none@1@S@*[[Home automation]]@@@@1@2@@danf@17-8-2009 10791040@unknown@formal@none@1@S@*[[Interactive voice response]]@@@@1@3@@danf@17-8-2009 10791050@unknown@formal@none@1@S@*[[Mobile telephony]], including mobile email@@@@1@5@@danf@17-8-2009 10791060@unknown@formal@none@1@S@*[[Multimodal interaction]]@@@@1@2@@danf@17-8-2009 10791070@unknown@formal@none@1@S@*[[Pronunciation]] evaluation in computer-aided language learning applications@@@@1@7@@danf@17-8-2009 10791080@unknown@formal@none@1@S@*[[Robotics]]@@@@1@1@@danf@17-8-2009 10791090@unknown@formal@none@1@S@*[[Transcription (linguistics)|Transcription]] (digital speech-to-text).@@@@1@4@@danf@17-8-2009 10791100@unknown@formal@none@1@S@*Speech-to-Text (Transcription of speech into mobile text messages)@@@@1@8@@danf@17-8-2009 10791110@unknown@formal@none@1@S@==Performance of speech recognition systems==@@@@1@5@@danf@17-8-2009 10791120@unknown@formal@none@1@S@The performance of speech recognition systems is usually specified in terms of accuracy and speed.@@@@1@15@@danf@17-8-2009 10791130@unknown@formal@none@1@S@Accuracy may be measured in terms of performance accuracy which is usually rated with [[word error rate]] (WER), whereas speed is measured with the [[real time factor]].@@@@1@27@@danf@17-8-2009 10791140@unknown@formal@none@1@S@Other measures of accuracy include [[Single Word Error Rate]] (SWER) and [[Command Success Rate]] (CSR).@@@@1@15@@danf@17-8-2009 10791150@unknown@formal@none@1@S@Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions.@@@@1@19@@danf@17-8-2009 10791160@unknown@formal@none@1@S@There is some confusion, however, over the interchangeability of the terms "speech recognition" and "dictation".@@@@1@15@@danf@17-8-2009 10791170@unknown@formal@none@1@S@Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy.@@@@1@35@@danf@17-8-2009 10791180@unknown@formal@none@1@S@Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions.@@@@1@19@@danf@17-8-2009 10791190@unknown@formal@none@1@S@`Optimal conditions' usually assume that users:@@@@1@6@@danf@17-8-2009 10791200@unknown@formal@none@1@S@* have speech characteristics which match the training data,@@@@1@9@@danf@17-8-2009 10791210@unknown@formal@none@1@S@* can achieve proper speaker adaptation, and@@@@1@7@@danf@17-8-2009 10791220@unknown@formal@none@1@S@* work in a clean noise environment (e.g. quiet office or laboratory space).@@@@1@13@@danf@17-8-2009 10791230@unknown@formal@none@1@S@This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected.@@@@1@20@@danf@17-8-2009 10791240@unknown@formal@none@1@S@Speech recognition in video has become a popular search technology used by several video search companies.@@@@1@16@@danf@17-8-2009 10791250@unknown@formal@none@1@S@Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers.@@@@1@23@@danf@17-8-2009 10791260@unknown@formal@none@1@S@Such systems are popular for routing incoming phone calls to their destinations in large organizations.@@@@1@15@@danf@17-8-2009 10791270@unknown@formal@none@1@S@Both [[Acoustic Model|acoustic modeling]] and [[language model]]ing are important parts of modern statistically-based speech recognition algorithms.@@@@1@16@@danf@17-8-2009 10791280@unknown@formal@none@1@S@Hidden Markov models (HMMs) are widely used in many systems.@@@@1@10@@danf@17-8-2009 10791290@unknown@formal@none@1@S@Language modeling has many other applications such as [[smart keyboard]] and [[document classification]].@@@@1@13@@danf@17-8-2009 10791300@unknown@formal@none@1@S@===Hidden Markov model (HMM)-based speech recognition===@@@@1@6@@danf@17-8-2009 10791310@unknown@formal@none@1@S@Modern general-purpose speech recognition systems are generally based on [[Hidden Markov Model|HMMs]].@@@@1@12@@danf@17-8-2009 10791320@unknown@formal@none@1@S@These are statistical models which output a sequence of symbols or quantities.@@@@1@12@@danf@17-8-2009 10791330@unknown@formal@none@1@S@One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piecewise stationary signal or a short-time stationary signal.@@@@1@28@@danf@17-8-2009 10791340@unknown@formal@none@1@S@That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a [[stationary process]].@@@@1@22@@danf@17-8-2009 10791350@unknown@formal@none@1@S@Speech could thus be thought of as a [[Markov model]] for many stochastic processes.@@@@1@14@@danf@17-8-2009 10791360@unknown@formal@none@1@S@Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use.@@@@1@21@@danf@17-8-2009 10791370@unknown@formal@none@1@S@In speech recognition, the hidden Markov model would output a sequence of ''n''-dimensional real-valued vectors (with ''n'' being a small integer, such as 10), outputting one of these every 10 milliseconds.@@@@1@31@@danf@17-8-2009 10791380@unknown@formal@none@1@S@The vectors would consist of [[cepstrum|cepstral]] coefficients, which are obtained by taking a [[Fourier transform]] of a short time window of speech and decorrelating the spectrum using a [[cosine transform]], then taking the first (most significant) coefficients.@@@@1@37@@danf@17-8-2009 10791390@unknown@formal@none@1@S@The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector.@@@@1@31@@danf@17-8-2009 10791400@unknown@formal@none@1@S@Each word, or (for more general speech recognition systems), each [[phoneme]], will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.@@@@1@44@@danf@17-8-2009 10791410@unknown@formal@none@1@S@Described above are the core elements of the most common, HMM-based approach to speech recognition.@@@@1@15@@danf@17-8-2009 10791420@unknown@formal@none@1@S@Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above.@@@@1@24@@danf@17-8-2009 10791430@unknown@formal@none@1@S@A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation.@@@@1@64@@danf@17-8-2009 10791440@unknown@formal@none@1@S@The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT).@@@@1@60@@danf@17-8-2009 10791450@unknown@formal@none@1@S@Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data.@@@@1@28@@danf@17-8-2009 10791460@unknown@formal@none@1@S@Examples are maximum [[mutual information]] (MMI), minimum classification error (MCE) and minimum phone error (MPE).@@@@1@15@@danf@17-8-2009 10791470@unknown@formal@none@1@S@Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the [[Viterbi algorithm]] to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the [[finite state transducer]], or FST, approach).@@@@1@72@@danf@17-8-2009 10791480@unknown@formal@none@1@S@===Dynamic time warping (DTW)-based speech recognition===@@@@1@6@@danf@17-8-2009 10791490@unknown@formal@none@1@S@Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach.@@@@1@25@@danf@17-8-2009 10791500@unknown@formal@none@1@S@Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed.@@@@1@19@@danf@17-8-2009 10791510@unknown@formal@none@1@S@For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation.@@@@1@42@@danf@17-8-2009 10791520@unknown@formal@none@1@S@DTW has been applied to video, audio, and graphics – indeed, any data which can be turned into a linear representation can be analyzed with DTW.@@@@1@25@@danf@17-8-2009 10791530@unknown@formal@none@1@S@A well known application has been automatic speech recognition, to cope with different speaking speeds.@@@@1@15@@danf@17-8-2009 10791540@unknown@formal@none@1@S@In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other.@@@@1@35@@danf@17-8-2009 10791550@unknown@formal@none@1@S@This sequence alignment method is often used in the context of hidden Markov models.@@@@1@14@@danf@17-8-2009 10791560@unknown@formal@none@1@S@==Further information==@@@@1@2@@danf@17-8-2009 10791570@unknown@formal@none@1@S@Popular speech recognition conferences held each year or two include ICASSP, Eurospeech/ICSLP (now named Interspeech) and the IEEE ASRU.@@@@1@19@@danf@17-8-2009 10791580@unknown@formal@none@1@S@Conferences in the field of [[Natural Language Processing]], such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing.@@@@1@23@@danf@17-8-2009 10791590@unknown@formal@none@1@S@Important journals include the [[IEEE]] Transactions on Speech and Audio Processing (now named [[IEEE]] Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication.@@@@1@28@@danf@17-8-2009 10791600@unknown@formal@none@1@S@Books like "Fundamentals of Speech Recognition" by [[Lawrence Rabiner]] can be useful to acquire basic knowledge but may not be fully up to date (1993).@@@@1@25@@danf@17-8-2009 10791610@unknown@formal@none@1@S@Another good source can be "Statistical Methods for Speech Recognition" by Frederick Jelinek which is a more up to date book (1998).@@@@1@22@@danf@17-8-2009 10791620@unknown@formal@none@1@S@Even more up to date is "Computer Speech", by Manfred R. Schroeder, second edition published in 2004.@@@@1@17@@danf@17-8-2009 10791630@unknown@formal@none@1@S@A good insight into the techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by [[DARPA]] (the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components).@@@@1@49@@danf@17-8-2009 10791640@unknown@formal@none@1@S@In terms of freely available resources, the [[HTK (software)|HTK]] book (and the accompanying HTK toolkit) is one place to start to both learn about speech recognition and to start experimenting.@@@@1@30@@danf@17-8-2009 10791650@unknown@formal@none@1@S@Another such resource is [[Carnegie Mellon University]]'s SPHINX toolkit.@@@@1@9@@danf@17-8-2009 10791660@unknown@formal@none@1@S@The AT&T libraries [http://www.research.att.com/projects/mohri/fsm FSM Library], [http://www.research.att.com/projects/mohri/grm GRM library], and [http://www.cs.nyu.edu/~mohri DCD library] are also general software libraries for large-vocabulary speech recognition.@@@@1@22@@danf@17-8-2009 10791670@unknown@formal@none@1@S@A useful review of the area of robustness in ASR is provided by Junqua and Haton (1995).@@@@1@17@@danf@17-8-2009 10800010@unknown@formal@none@1@S@
Speech synthesis
@@@@1@2@@danf@17-8-2009 10800020@unknown@formal@none@1@S@'''Speech synthesis''' is the artificial production of human [[Speech communication|speech]].@@@@1@10@@danf@17-8-2009 10800030@unknown@formal@none@1@S@A computer system used for this purpose is called a '''speech synthesizer''', and can be implemented in [[software]] or [[Computer hardware|hardware]].@@@@1@21@@danf@17-8-2009 10800040@unknown@formal@none@1@S@A '''text-to-speech (TTS)''' system converts normal language text into speech; other systems render [[symbolic linguistic representation]]s like [[phonetic transcription]]s into speech.@@@@1@21@@danf@17-8-2009 10800050@unknown@formal@none@1@S@Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a [[database]].@@@@1@17@@danf@17-8-2009 10800060@unknown@formal@none@1@S@Systems differ in the size of the stored speech units; a system that stores [[phone]]s or [[diphone]]s provides the largest output range, but may lack clarity.@@@@1@26@@danf@17-8-2009 10800070@unknown@formal@none@1@S@For specific usage domains, the storage of entire words or sentences allows for high-quality output.@@@@1@15@@danf@17-8-2009 10800080@unknown@formal@none@1@S@Alternatively, a synthesizer can incorporate a model of the [[vocal tract]] and other human voice characteristics to create a completely "synthetic" voice output.@@@@1@23@@danf@17-8-2009 10800090@unknown@formal@none@1@S@The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood.@@@@1@22@@danf@17-8-2009 10800100@unknown@formal@none@1@S@An intelligible text-to-speech program allows people with [[visual impairment]]s or [[reading disability|reading disabilities]] to listen to written works on a home computer.@@@@1@22@@danf@17-8-2009 10800110@unknown@formal@none@1@S@Many computer operating systems have included speech synthesizers since the early 1980s.@@@@1@12@@danf@17-8-2009 10800120@unknown@formal@none@1@S@== Overview of text processing ==@@@@1@6@@danf@17-8-2009 10800130@unknown@formal@none@1@S@A text-to-speech system (or "engine") is composed of two parts: a [[front-end]] and a back-end.@@@@1@15@@danf@17-8-2009 10800140@unknown@formal@none@1@S@The front-end has two major tasks.@@@@1@6@@danf@17-8-2009 10800150@unknown@formal@none@1@S@First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words.@@@@1@17@@danf@17-8-2009 10800160@unknown@formal@none@1@S@This process is often called ''text normalization'', ''pre-processing'', or ''[[tokenization]]''.@@@@1@10@@danf@17-8-2009 10800170@unknown@formal@none@1@S@The front-end then assigns [[phonetic transcription]]s to each word, and divides and marks the text into [[prosody (linguistics)|prosodic units]], like [[phrase]]s, [[clause]]s, and [[sentence (linguistics)|sentence]]s.@@@@1@25@@danf@17-8-2009 10800180@unknown@formal@none@1@S@The process of assigning phonetic transcriptions to words is called ''text-to-phoneme'' or ''[[grapheme]]-to-phoneme'' conversion.@@@@1@14@@danf@17-8-2009 10800190@unknown@formal@none@1@S@Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end.@@@@1@18@@danf@17-8-2009 10800200@unknown@formal@none@1@S@The back-end—often referred to as the ''synthesizer''—then converts the symbolic linguistic representation into sound.@@@@1@14@@danf@17-8-2009 10800210@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10800220@unknown@formal@none@1@S@Long before [[electronics|electronic]] [[signal processing]] was invented, there were those who tried to build machines to create human speech.@@@@1@19@@danf@17-8-2009 10800230@unknown@formal@none@1@S@Some early legends of the existence of [[Brazen Head|"speaking heads"]] involved [[Pope Silvester II|Gerbert of Aurillac]] (d. 1003 AD), [[Albertus Magnus]] (1198–1280), and [[Roger Bacon]] (1214–1294).@@@@1@26@@danf@17-8-2009 10800240@unknown@formal@none@1@S@In 1779, the [[Denmark|Danish]] scientist Christian Kratzenstein, working at the [[Russian Academy of Sciences]], built models of the human [[vocal tract]] that could produce the five long [[vowel]] sounds (in [[help:IPA|International Phonetic Alphabet]] notation, they are {{IPA|[aː]}}, {{IPA|[eː]}}, {{IPA|[iː]}}, {{IPA|[oː]}} and {{IPA|[uː]}}).@@@@1@42@@danf@17-8-2009 10800250@unknown@formal@none@1@S@This was followed by the [[bellows]]-operated "acoustic-mechanical speech machine" by [[Wolfgang von Kempelen]] of [[Vienna]], [[Austria]], described in a 1791 paper.@@@@1@21@@danf@17-8-2009 10800260@unknown@formal@none@1@S@This machine added models of the tongue and lips, enabling it to produce [[consonant]]s as well as vowels.@@@@1@18@@danf@17-8-2009 10800270@unknown@formal@none@1@S@In 1837, [[Charles Wheatstone]] produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia".@@@@1@21@@danf@17-8-2009 10800280@unknown@formal@none@1@S@Wheatstone's design was resurrected in 1923 by Paget.@@@@1@8@@danf@17-8-2009 10800290@unknown@formal@none@1@S@In the 1930s, [[Bell Labs]] developed the [[Vocoder|VOCODER]], a keyboard-operated electronic speech analyzer and synthesizer that was said to be clearly intelligible.@@@@1@22@@danf@17-8-2009 10800300@unknown@formal@none@1@S@[[Homer Dudley]] refined this device into the VODER, which he exhibited at the [[1939 New York World's Fair]].@@@@1@18@@danf@17-8-2009 10800310@unknown@formal@none@1@S@The [[Pattern playback]] was built by [[Franklin S. Cooper|Dr. Franklin S. Cooper]] and his colleagues at [[Haskins Laboratories]] in the late 1940s and completed in 1950.@@@@1@26@@danf@17-8-2009 10800320@unknown@formal@none@1@S@There were several different versions of this hardware device but only one currently survives.@@@@1@14@@danf@17-8-2009 10800330@unknown@formal@none@1@S@The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound.@@@@1@19@@danf@17-8-2009 10800340@unknown@formal@none@1@S@Using this device, [[Alvin Liberman]] and colleagues were able to discover acoustic cues for the perception of [[phonetic]] segments (consonants and vowels).@@@@1@22@@danf@17-8-2009 10800350@unknown@formal@none@1@S@Early electronic speech synthesizers sounded robotic and were often barely intelligible.@@@@1@11@@danf@17-8-2009 10800360@unknown@formal@none@1@S@However, the quality of synthesized speech has steadily improved, and output from contemporary speech synthesis systems is sometimes indistinguishable from actual human speech.@@@@1@23@@danf@17-8-2009 10800370@unknown@formal@none@1@S@=== Electronic devices ===@@@@1@4@@danf@17-8-2009 10800380@unknown@formal@none@1@S@The first computer-based speech synthesis systems were created in the late 1950s, and the first complete text-to-speech system was completed in 1968.@@@@1@22@@danf@17-8-2009 10800390@unknown@formal@none@1@S@In 1961, physicist [[John Larry Kelly, Jr]] and colleague Louis Gerstman used an [[IBM 704]] computer to synthesize speech, an event among the most prominent in the history of [[Bell Labs]].@@@@1@31@@danf@17-8-2009 10800400@unknown@formal@none@1@S@Kelly's voice recorder synthesizer (vocoder) recreated the song "[[Daisy Bell]]", with musical accompaniment from [[Max Mathews]].@@@@1@16@@danf@17-8-2009 10800410@unknown@formal@none@1@S@Coincidentally, [[Arthur C. Clarke]] was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility.@@@@1@19@@danf@17-8-2009 10800420@unknown@formal@none@1@S@Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel ''[[2001: A Space Odyssey (novel)|2001: A Space Odyssey]]'', where the [[HAL 9000]] computer sings the same song as it is being put to sleep by astronaut [[Dave Bowman]].@@@@1@49@@danf@17-8-2009 10800430@unknown@formal@none@1@S@Despite the success of purely electronic speech synthesis, research is still being conducted into mechanical speech synthesizers.@@@@1@17@@danf@17-8-2009 10800440@unknown@formal@none@1@S@== Synthesizer technologies ==@@@@1@4@@danf@17-8-2009 10800450@unknown@formal@none@1@S@The most important qualities of a speech synthesis system are ''naturalness'' and ''[[Intelligibility]]''.@@@@1@13@@danf@17-8-2009 10800460@unknown@formal@none@1@S@Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood.@@@@1@21@@danf@17-8-2009 10800470@unknown@formal@none@1@S@The ideal speech synthesizer is both natural and intelligible.@@@@1@9@@danf@17-8-2009 10800480@unknown@formal@none@1@S@Speech synthesis systems usually try to maximize both characteristics.@@@@1@9@@danf@17-8-2009 10800490@unknown@formal@none@1@S@The two primary technologies for generating synthetic speech waveforms are ''concatenative synthesis'' and ''[[formant]] synthesis''.@@@@1@15@@danf@17-8-2009 10800500@unknown@formal@none@1@S@Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used.@@@@1@21@@danf@17-8-2009 10800510@unknown@formal@none@1@S@=== Concatenative synthesis ===@@@@1@4@@danf@17-8-2009 10800520@unknown@formal@none@1@S@Concatenative synthesis is based on the [[concatenation]] (or stringing together) of segments of recorded speech.@@@@1@15@@danf@17-8-2009 10800530@unknown@formal@none@1@S@Generally, concatenative synthesis produces the most natural-sounding synthesized speech.@@@@1@9@@danf@17-8-2009 10800540@unknown@formal@none@1@S@However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output.@@@@1@26@@danf@17-8-2009 10800550@unknown@formal@none@1@S@There are three main sub-types of concatenative synthesis.@@@@1@8@@danf@17-8-2009 10800560@unknown@formal@none@1@S@
==== Unit selection synthesis ====@@@@1@6@@danf@17-8-2009 10800570@unknown@formal@none@1@S@Unit selection synthesis uses large [[database]]s of recorded speech.@@@@1@9@@danf@17-8-2009 10800580@unknown@formal@none@1@S@During database creation, each recorded utterance is segmented into some or all of the following: individual [[phone]]s, [[diphone]]s, half-phones, [[syllable]]s, [[morpheme]]s, [[word]]s, [[phrase]]s, and [[Sentence (linguistics)|sentence]]s.@@@@1@26@@danf@17-8-2009 10800590@unknown@formal@none@1@S@Typically, the division into segments is done using a specially modified [[speech recognition|speech recognizer]] set to a "forced alignment" mode with some manual correction afterward, using visual representations such as the [[waveform]] and [[spectrogram]].@@@@1@34@@danf@17-8-2009 10800600@unknown@formal@none@1@S@An [[index (database)|index]] of the units in the speech database is then created based on the segmentation and acoustic parameters like the [[fundamental frequency]] ([[pitch (music)|pitch]]), duration, position in the syllable, and neighboring phones.@@@@1@34@@danf@17-8-2009 10800610@unknown@formal@none@1@S@At [[runtime]], the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).@@@@1@21@@danf@17-8-2009 10800620@unknown@formal@none@1@S@This process is typically achieved using a specially weighted [[decision tree]].@@@@1@11@@danf@17-8-2009 10800630@unknown@formal@none@1@S@Unit selection provides the greatest naturalness, because it applies only a small amount of [[digital signal processing]] (DSP) to the recorded speech.@@@@1@22@@danf@17-8-2009 10800640@unknown@formal@none@1@S@DSP often makes recorded speech sound less natural, although some systems use a small amount of signal processing at the point of concatenation to smooth the waveform.@@@@1@27@@danf@17-8-2009 10800650@unknown@formal@none@1@S@The output from the best unit-selection systems is often indistinguishable from real human voices, especially in contexts for which the TTS system has been tuned.@@@@1@25@@danf@17-8-2009 10800660@unknown@formal@none@1@S@However, maximum naturalness typically require unit-selection speech databases to be very large, in some systems ranging into the [[gigabyte]]s of recorded data, representing dozens of hours of speech.@@@@1@28@@danf@17-8-2009 10800670@unknown@formal@none@1@S@Also, unit selection algorithms have been known to select segments from a place that results in less than ideal synthesis (e.g. minor words become unclear) even when a better choice exists in the database.
@@@@1@34@@danf@17-8-2009 10800680@unknown@formal@none@1@S@
==== Diphone synthesis ====@@@@1@5@@danf@17-8-2009 10800690@unknown@formal@none@1@S@Diphone synthesis uses a minimal speech database containing all the [[diphone]]s (sound-to-sound transitions) occurring in a language.@@@@1@17@@danf@17-8-2009 10800700@unknown@formal@none@1@S@The number of diphones depends on the [[phonotactics]] of the language: for example, Spanish has about 800 diphones, and German about 2500.@@@@1@22@@danf@17-8-2009 10800710@unknown@formal@none@1@S@In diphone synthesis, only one example of each diphone is contained in the speech database.@@@@1@15@@danf@17-8-2009 10800720@unknown@formal@none@1@S@At runtime, the target [[prosody]] of a sentence is superimposed on these minimal units by means of [[digital signal processing]] techniques such as [[linear predictive coding]], [[PSOLA]] or [[MBROLA]].@@@@1@29@@danf@17-8-2009 10800730@unknown@formal@none@1@S@The quality of the resulting speech is generally worse than that of unit-selection systems, but more natural-sounding than the output of formant synthesizers.@@@@1@23@@danf@17-8-2009 10800740@unknown@formal@none@1@S@Diphone synthesis suffers from the sonic glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has few of the advantages of either approach other than small size.@@@@1@30@@danf@17-8-2009 10800750@unknown@formal@none@1@S@As such, its use in commercial applications is declining, although it continues to be used in research because there are a number of freely available software implementations.
@@@@1@27@@danf@17-8-2009 10800760@unknown@formal@none@1@S@
==== Domain-specific synthesis ====@@@@1@5@@danf@17-8-2009 10800770@unknown@formal@none@1@S@Domain-specific synthesis concatenates prerecorded words and phrases to create complete utterances.@@@@1@11@@danf@17-8-2009 10800780@unknown@formal@none@1@S@It is used in applications where the variety of texts the system will output is limited to a particular domain, like transit schedule announcements or weather reports.@@@@1@27@@danf@17-8-2009 10800790@unknown@formal@none@1@S@The technology is very simple to implement, and has been in commercial use for a long time, in devices like talking clocks and calculators.@@@@1@24@@danf@17-8-2009 10800800@unknown@formal@none@1@S@The level of naturalness of these systems can be very high because the variety of sentence types is limited, and they closely match the prosody and intonation of the original recordings.@@@@1@31@@danf@17-8-2009 10800810@unknown@formal@none@1@S@Because these systems are limited by the words and phrases in their databases, they are not general-purpose and can only synthesize the combinations of words and phrases with which they have been preprogrammed.@@@@1@33@@danf@17-8-2009 10800820@unknown@formal@none@1@S@The blending of words within naturally spoken language however can still cause problems unless the many variations are taken into account.@@@@1@21@@danf@17-8-2009 10800830@unknown@formal@none@1@S@For example, in [[Rhotic and non-rhotic accents|non-rhotic]] dialects of English the in words like {{IPA|/ˈkliːə/}} is usually only pronounced when the following word has a vowel as its first letter (e.g. is realized as {{IPA|/ˌkliːəɹˈɑʊt/}}).@@@@1@39@@danf@17-8-2009 10800840@unknown@formal@none@1@S@Likewise in [[French language|French]], many final consonants become no longer silent if followed by a word that begins with a vowel, an effect called [[Liaison (French)|liaison]].@@@@1@26@@danf@17-8-2009 10800845@unknown@formal@none@1@S@This [[alternation (linguistics)|alternation]] cannot be reproduced by a simple word-concatenation system, which would require additional complexity to be [[context-sensitive]].
@@@@1@19@@danf@17-8-2009 10800850@unknown@formal@none@1@S@=== Formant synthesis ===@@@@1@4@@danf@17-8-2009 10800860@unknown@formal@none@1@S@[[Formant]] synthesis does not use human speech samples at runtime.@@@@1@10@@danf@17-8-2009 10800870@unknown@formal@none@1@S@Instead, the synthesized speech output is created using an acoustic model.@@@@1@11@@danf@17-8-2009 10800880@unknown@formal@none@1@S@Parameters such as [[fundamental frequency]], [[phonation|voicing]], and [[noise]] levels are varied over time to create a [[waveform]] of artificial speech.@@@@1@20@@danf@17-8-2009 10800890@unknown@formal@none@1@S@This method is sometimes called ''rules-based synthesis''; however, many concatenative systems also have rules-based components.@@@@1@15@@danf@17-8-2009 10800900@unknown@formal@none@1@S@Many systems based on formant synthesis technology generate artificial, robotic-sounding speech that would never be mistaken for human speech.@@@@1@19@@danf@17-8-2009 10800910@unknown@formal@none@1@S@However, maximum naturalness is not always the goal of a speech synthesis system, and formant synthesis systems have advantages over concatenative systems.@@@@1@22@@danf@17-8-2009 10800920@unknown@formal@none@1@S@Formant-synthesized speech can be reliably intelligible, even at very high speeds, avoiding the acoustic glitches that commonly plague concatenative systems.@@@@1@20@@danf@17-8-2009 10800930@unknown@formal@none@1@S@High-speed synthesized speech is used by the visually impaired to quickly navigate computers using a [[screen reader]].@@@@1@17@@danf@17-8-2009 10800940@unknown@formal@none@1@S@Formant synthesizers are usually smaller programs than concatenative systems because they do not have a database of speech samples.@@@@1@19@@danf@17-8-2009 10800950@unknown@formal@none@1@S@They can therefore be used in [[embedded system]]s, where [[data storage device|memory]] and [[microprocessor]] power are especially limited.@@@@1@18@@danf@17-8-2009 10800960@unknown@formal@none@1@S@Because formant-based systems have complete control of all aspects of the output speech, a wide variety of prosodies and [[Intonation (linguistics)|intonation]]s can be output, conveying not just questions and statements, but a variety of emotions and tones of voice.@@@@1@39@@danf@17-8-2009 10800970@unknown@formal@none@1@S@Examples of non-real-time but highly accurate intonation control in formant synthesis include the work done in the late 1970s for the [[Texas Instruments]] toy [[Speak & Spell (game)|Speak & Spell]], and in the early 1980s [[Sega]] [[Video arcade|arcade]] machines.@@@@1@39@@danf@17-8-2009 10800980@unknown@formal@none@1@S@Creating proper intonation for these projects was painstaking, and the results have yet to be matched by real-time text-to-speech interfaces.@@@@1@20@@danf@17-8-2009 10800990@unknown@formal@none@1@S@=== Articulatory synthesis ===@@@@1@4@@danf@17-8-2009 10801000@unknown@formal@none@1@S@[[Articulatory synthesis]] refers to computational techniques for synthesizing speech based on models of the human [[vocal tract]] and the articulation processes occurring there.@@@@1@23@@danf@17-8-2009 10801010@unknown@formal@none@1@S@The first articulatory synthesizer regularly used for laboratory experiments was developed at [[Haskins Laboratories]] in the mid-1970s by [[Philip Rubin]], Tom Baer, and Paul Mermelstein.@@@@1@25@@danf@17-8-2009 10801020@unknown@formal@none@1@S@This synthesizer, known as ASY, was based on vocal tract models developed at [[Bell Laboratories]] in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues.@@@@1@27@@danf@17-8-2009 10801030@unknown@formal@none@1@S@Until recently, articulatory synthesis models have not been incorporated into commercial speech synthesis systems.@@@@1@14@@danf@17-8-2009 10801040@unknown@formal@none@1@S@A notable exception is the [[NeXT]]-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the [[University of Calgary]], where much of the original research was conducted.@@@@1@31@@danf@17-8-2009 10801050@unknown@formal@none@1@S@Following the demise of the various incarnations of NeXT (started by [[Steve Jobs]] in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under the [[GNU General Public License]], with work continuing as ''gnuspeech''.@@@@1@40@@danf@17-8-2009 10801060@unknown@formal@none@1@S@The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Carré's "distinctive region model".@@@@1@30@@danf@17-8-2009 10801070@unknown@formal@none@1@S@=== HMM-based synthesis ===@@@@1@4@@danf@17-8-2009 10801080@unknown@formal@none@1@S@HMM-based synthesis is a synthesis method based on [[hidden Markov model]]s.@@@@1@11@@danf@17-8-2009 10801090@unknown@formal@none@1@S@In this system, the [[frequency spectrum]] ([[vocal tract]]), [[fundamental frequency]] (vocal source), and duration ([[prosody]]) of speech are modeled simultaneously by HMMs.@@@@1@22@@danf@17-8-2009 10801100@unknown@formal@none@1@S@Speech [[waveforms]] are generated from HMMs themselves based on the [[maximum likelihood]] criterion.@@@@1@13@@danf@17-8-2009 10801110@unknown@formal@none@1@S@=== Sinewave synthesis ===@@@@1@4@@danf@17-8-2009 10801120@unknown@formal@none@1@S@[[Sinewave synthesis]] is a technique for synthesizing speech by replacing the [[formants]] (main bands of energy) with pure tone whistles.@@@@1@20@@danf@17-8-2009 10801130@unknown@formal@none@1@S@== Challenges ==@@@@1@3@@danf@17-8-2009 10801140@unknown@formal@none@1@S@=== Text normalization challenges ===@@@@1@5@@danf@17-8-2009 10801150@unknown@formal@none@1@S@The process of normalizing text is rarely straightforward.@@@@1@8@@danf@17-8-2009 10801160@unknown@formal@none@1@S@Texts are full of [[Heteronym (linguistics)|heteronym]]s, [[number]]s, and [[abbreviation]]s that all require expansion into a phonetic representation.@@@@1@17@@danf@17-8-2009 10801170@unknown@formal@none@1@S@There are many spellings in English which are pronounced differently based on context.@@@@1@13@@danf@17-8-2009 10801180@unknown@formal@none@1@S@For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".@@@@1@19@@danf@17-8-2009 10801190@unknown@formal@none@1@S@Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective.@@@@1@26@@danf@17-8-2009 10801200@unknown@formal@none@1@S@As a result, various [[heuristic]] techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.@@@@1@27@@danf@17-8-2009 10801210@unknown@formal@none@1@S@Deciding how to convert numbers is another problem that TTS systems have to address.@@@@1@14@@danf@17-8-2009 10801220@unknown@formal@none@1@S@It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five."@@@@1@20@@danf@17-8-2009 10801230@unknown@formal@none@1@S@However, numbers occur in many different contexts; when a year or part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a [[social security number]], as "one three two five".@@@@1@36@@danf@17-8-2009 10801240@unknown@formal@none@1@S@A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.@@@@1@33@@danf@17-8-2009 10801250@unknown@formal@none@1@S@Similarly, abbreviations can be ambiguous.@@@@1@5@@danf@17-8-2009 10801260@unknown@formal@none@1@S@For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street".@@@@1@30@@danf@17-8-2009 10801270@unknown@formal@none@1@S@TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs.@@@@1@29@@danf@17-8-2009 10801280@unknown@formal@none@1@S@=== Text-to-phoneme challenges ===@@@@1@4@@danf@17-8-2009 10801290@unknown@formal@none@1@S@Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion ([[phoneme]] is the term used by linguists to describe distinctive sounds in a language).@@@@1@42@@danf@17-8-2009 10801300@unknown@formal@none@1@S@The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program.@@@@1@30@@danf@17-8-2009 10801310@unknown@formal@none@1@S@Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary.@@@@1@29@@danf@17-8-2009 10801320@unknown@formal@none@1@S@The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings.@@@@1@21@@danf@17-8-2009 10801330@unknown@formal@none@1@S@This is similar to the "sounding out", or [[synthetic phonics]], approach to learning reading.@@@@1@14@@danf@17-8-2009 10801340@unknown@formal@none@1@S@Each approach has advantages and drawbacks.@@@@1@6@@danf@17-8-2009 10801350@unknown@formal@none@1@S@The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary.@@@@1@22@@danf@17-8-2009 10801360@unknown@formal@none@1@S@As dictionary size grows, so too does the memory space requirements of the synthesis system.@@@@1@15@@danf@17-8-2009 10801370@unknown@formal@none@1@S@On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations.@@@@1@29@@danf@17-8-2009 10801380@unknown@formal@none@1@S@(Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].)@@@@1@23@@danf@17-8-2009 10801390@unknown@formal@none@1@S@As a result, nearly all speech synthesis systems use a combination of these approaches.@@@@1@14@@danf@17-8-2009 10801400@unknown@formal@none@1@S@Some languages, like [[Spanish language|Spanish]], have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful.@@@@1@26@@danf@17-8-2009 10801410@unknown@formal@none@1@S@Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and borrowings, whose pronunciations are not obvious from their spellings.@@@@1@33@@danf@17-8-2009 10801420@unknown@formal@none@1@S@On the other hand, speech synthesis systems for languages like [[English language|English]], which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or words that aren't in their dictionaries.@@@@1@41@@danf@17-8-2009 10801430@unknown@formal@none@1@S@=== Evaluation challenges ===@@@@1@4@@danf@17-8-2009 10801440@unknown@formal@none@1@S@It is very difficult to evaluate speech synthesis systems consistently because there is no subjective criterion and usually different organizations use different speech data.@@@@1@24@@danf@17-8-2009 10801450@unknown@formal@none@1@S@The quality of a speech synthesis system highly depends on the quality of recording.@@@@1@14@@danf@17-8-2009 10801460@unknown@formal@none@1@S@Therefore, evaluating speech synthesis systems is almost the same as evaluating the recording skills.@@@@1@14@@danf@17-8-2009 10801470@unknown@formal@none@1@S@Recently researchers start evaluating speech synthesis systems using the common speech dataset.@@@@1@12@@danf@17-8-2009 10801480@unknown@formal@none@1@S@This may help people to compare the difference between technologies rather than recordings.@@@@1@13@@danf@17-8-2009 10801490@unknown@formal@none@1@S@=== Prosodics and emotional content ===@@@@1@6@@danf@17-8-2009 10801500@unknown@formal@none@1@S@A recent study reported in the journal "'''Speech Communication'''" by Amy Drahota and colleagues at the [[University of Portsmouth]], [[UK]], reported that listeners to voice recordings could determine, at better than chance levels, whether or not the speaker was smiling.@@@@1@40@@danf@17-8-2009 10801510@unknown@formal@none@1@S@It was suggested that identification of the vocal features which signal emotional content may be used to help make synthesized speech sound more natural.@@@@1@24@@danf@17-8-2009 10801520@unknown@formal@none@1@S@== Dedicated hardware ==@@@@1@4@@danf@17-8-2009 10801530@unknown@formal@none@1@S@*Votrax@@@@1@1@@danf@17-8-2009 10801540@unknown@formal@none@1@S@**SC-01A (analog formant)@@@@1@3@@danf@17-8-2009 10801550@unknown@formal@none@1@S@**SC-02 / SSI-263 / "Arctic 263"@@@@1@6@@danf@17-8-2009 10801560@unknown@formal@none@1@S@*General Instruments SP0256-AL2 (CTS256A-AL2, MEA8000)@@@@1@5@@danf@17-8-2009 10801570@unknown@formal@none@1@S@*Magnevation SpeakJet (www.speechchips.com TTS256)@@@@1@4@@danf@17-8-2009 10801580@unknown@formal@none@1@S@*Savage Innovations SoundGin@@@@1@3@@danf@17-8-2009 10801590@unknown@formal@none@1@S@*National Semiconductor DT1050 Digitalker (Mozer)@@@@1@5@@danf@17-8-2009 10801600@unknown@formal@none@1@S@*Silicon Systems SSI 263 (analog formant)@@@@1@6@@danf@17-8-2009 10801610@unknown@formal@none@1@S@*Texas Instruments@@@@1@2@@danf@17-8-2009 10801620@unknown@formal@none@1@S@**TMS5110A (LPC)@@@@1@2@@danf@17-8-2009 10801630@unknown@formal@none@1@S@**TMS5200@@@@1@1@@danf@17-8-2009 10801640@unknown@formal@none@1@S@*Oki Semiconductor@@@@1@2@@danf@17-8-2009 10801650@unknown@formal@none@1@S@**MSM5205@@@@1@1@@danf@17-8-2009 10801660@unknown@formal@none@1@S@**MSM5218RS (ADPCM)@@@@1@2@@danf@17-8-2009 10801670@unknown@formal@none@1@S@*Toshiba T6721A@@@@1@2@@danf@17-8-2009 10801680@unknown@formal@none@1@S@*Philips PCF8200@@@@1@2@@danf@17-8-2009 10801690@unknown@formal@none@1@S@== Computer operating systems or outlets with speech synthesis ==@@@@1@10@@danf@17-8-2009 10801700@unknown@formal@none@1@S@=== Apple ===@@@@1@3@@danf@17-8-2009 10801710@unknown@formal@none@1@S@The first speech system integrated into an [[operating system]] was [[Apple Computer]]'s [[PlainTalk#The original MacInTalk|MacInTalk]] in 1984.@@@@1@17@@danf@17-8-2009 10801720@unknown@formal@none@1@S@Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software.@@@@1@14@@danf@17-8-2009 10801730@unknown@formal@none@1@S@In the early 1990s Apple expanded its capabilities offering system wide text-to-speech support.@@@@1@13@@danf@17-8-2009 10801740@unknown@formal@none@1@S@With the introduction of faster PowerPC based computers they included higher quality voice sampling.@@@@1@14@@danf@17-8-2009 10801750@unknown@formal@none@1@S@Apple also introduced [[speech recognition]] into its systems which provided a fluid command set.@@@@1@14@@danf@17-8-2009 10801760@unknown@formal@none@1@S@More recently, Apple has added sample-based voices.@@@@1@7@@danf@17-8-2009 10801770@unknown@formal@none@1@S@Starting as a curiosity, the speech system of Apple [[Macintosh (computer)|Macintosh]] has evolved into a cutting edge fully-supported program, [[PlainTalk]], for people with vision problems.@@@@1@25@@danf@17-8-2009 10801780@unknown@formal@none@1@S@[[VoiceOver]] was included in Mac OS Tiger and more recently Mac OS Leopard.@@@@1@13@@danf@17-8-2009 10801790@unknown@formal@none@1@S@The voice shipping with Mac OS X 10.5 ("Leopard") is called "Alex" and features the taking of realistic-sounding breaths between sentences, as well as improved clarity at high read rates.@@@@1@30@@danf@17-8-2009 10801800@unknown@formal@none@1@S@=== AmigaOS ===@@@@1@3@@danf@17-8-2009 10801810@unknown@formal@none@1@S@The second operating system with advanced speech synthesis capabilities was [[AmigaOS]], introduced in 1985.@@@@1@14@@danf@17-8-2009 10801820@unknown@formal@none@1@S@The voice synthesis was licensed by [[Commodore International]] from a third-party software house (Don't Ask Software, now Softvoice, Inc.) and it featured a complete system of voice emulation, with both male and female voices and "stress" indicator markers, made possible by advanced features of the [[Amiga]] hardware audio [[chipset]].@@@@1@49@@danf@17-8-2009 10801830@unknown@formal@none@1@S@It was divided into a narrator device and a translator library.@@@@1@11@@danf@17-8-2009 10801840@unknown@formal@none@1@S@Amiga [[AmigaOS#Speech synthesis|Speak Handler]] featured a text-to-speech translator.@@@@1@8@@danf@17-8-2009 10801850@unknown@formal@none@1@S@AmigaOS considered speech synthesis a virtual hardware device, so the user could even redirect console output to it.@@@@1@18@@danf@17-8-2009 10801860@unknown@formal@none@1@S@Some Amiga programs, such as word processors, made extensive use of the speech system.@@@@1@14@@danf@17-8-2009 10801870@unknown@formal@none@1@S@=== Microsoft Windows ===@@@@1@4@@danf@17-8-2009 10801880@unknown@formal@none@1@S@Modern [[Microsoft Windows|Windows]] systems use [[Speech Application Programming Interface#SAPI 1-4 API family|SAPI4]]- and [[Speech Application Programming Interface#SAPI 5 API family|SAPI5]]-based speech systems that include a [[speech recognition]] engine (SRE).@@@@1@29@@danf@17-8-2009 10801890@unknown@formal@none@1@S@SAPI 4.0 was available on Microsoft-based operating systems as a third-party add-on for systems like [[Windows 95]] and [[Windows 98]].@@@@1@20@@danf@17-8-2009 10801900@unknown@formal@none@1@S@[[Windows 2000]] added a speech synthesis program called [[Microsoft Narrator|Narrator]], directly available to users.@@@@1@14@@danf@17-8-2009 10801910@unknown@formal@none@1@S@All Windows-compatible programs could make use of speech synthesis features, available through menus once installed on the system.@@@@1@18@@danf@17-8-2009 10801920@unknown@formal@none@1@S@[[Microsoft Speech Server]] is a complete package for voice synthesis and recognition, for commercial applications such as [[call centers]].@@@@1@19@@danf@17-8-2009 10801930@unknown@formal@none@1@S@=== Internet ===@@@@1@3@@danf@17-8-2009 10801940@unknown@formal@none@1@S@Currently, there are a number of [[Application software|applications]], [[plugin]]s and [[gadget]]s that can read messages directly from an [[e-mail client]] and web pages from a [[web browser]].@@@@1@27@@danf@17-8-2009 10801950@unknown@formal@none@1@S@Some specialized [[Computer software|software]] can narrate [[RSS|RSS-feeds]].@@@@1@7@@danf@17-8-2009 10801960@unknown@formal@none@1@S@On one hand, online RSS-narrators simplify information delivery by allowing users to listen to their favourite news sources and to convert them to [[podcast]]s.@@@@1@24@@danf@17-8-2009 10801970@unknown@formal@none@1@S@On the other hand, on-line RSS-readers are available on almost any [[Personal computer|PC]] connected to the Internet.@@@@1@17@@danf@17-8-2009 10801980@unknown@formal@none@1@S@Users can download generated audio files to portable devices, e.g. with a help of [[podcast]] receiver, and listen to them while walking, jogging or commuting to work.@@@@1@27@@danf@17-8-2009 10801990@unknown@formal@none@1@S@A growing field in internet based TTS technology is web-based assistive technology, e.g. Talklets.@@@@1@14@@danf@17-8-2009 10802000@unknown@formal@none@1@S@This web based approach to a traditionally locally installed form of software application can afford many of those requiring software for accessibility reason, the ability to access web content from public machines, or those belonging to others.@@@@1@37@@danf@17-8-2009 10802010@unknown@formal@none@1@S@While responsiveness is not as immediate as that of applications installed locally, the 'access anywhere' nature of it is the key benefit to this approach.@@@@1@25@@danf@17-8-2009 10802020@unknown@formal@none@1@S@=== Others ===@@@@1@3@@danf@17-8-2009 10802030@unknown@formal@none@1@S@* Some models of Texas Instruments home computers produced in 1979 and 1981 ([[TI-99/4A|Texas Instruments TI-99/4 and TI-99/4A]]) were capable of text-to-phoneme synthesis or reciting complete words and phrases (text-to-dictionary), using a very popular Speech Synthesizer peripheral.@@@@1@37@@danf@17-8-2009 10802040@unknown@formal@none@1@S@TI used a proprietary [[codec]] to embed complete spoken phrases into applications, primarily video games.@@@@1@15@@danf@17-8-2009 10802050@unknown@formal@none@1@S@* Systems that operate on free and open source software systems including [[Linux|GNU/Linux]] are various, and include [[open-source]] programs such as the [[Festival Speech Synthesis System]] which uses diphone-based synthesis (and can use a limited number of [[MBROLA]] voices), and gnuspeech which uses articulatory synthesis from the [[Free Software Foundation]].@@@@1@50@@danf@17-8-2009 10802060@unknown@formal@none@1@S@Other commercial vendor software also runs on GNU/Linux.@@@@1@8@@danf@17-8-2009 10802070@unknown@formal@none@1@S@* Several commercial companies are also developing speech synthesis systems (this list is reporting them just for the sake of information, not endorsing any specific product): [http://www.acapela-group.com Acapela Group], [[AT&T]], [[Cepstral]], [[DECtalk]], [[IBM ViaVoice]], [[IVONA|IVONA TTS]], [http://www.loquendo.com Loquendo TTS], [http://www.neospeech.com NeoSpeech TTS], [[Nuance Communications]], Rhetorical Systems, [http://www.svox.com SVOX] and [http://www.yakitome.com YAKiToMe!].@@@@1@51@@danf@17-8-2009 10802080@unknown@formal@none@1@S@* Companies which developed speech synthesis systems but which are no longer in this business include BeST Speech (bought by L&H), [[Lernout & Hauspie]] (bankrupt), [[SpeechWorks]] (bought by Nuance)@@@@1@29@@danf@17-8-2009 10802090@unknown@formal@none@1@S@== Speech synthesis markup languages ==@@@@1@6@@danf@17-8-2009 10802100@unknown@formal@none@1@S@A number of [[markup language]]s have been established for the rendition of text as speech in an [[XML]]-compliant format.@@@@1@19@@danf@17-8-2009 10802110@unknown@formal@none@1@S@The most recent is [[Speech Synthesis Markup Language]] (SSML), which became a [[W3C recommendation]] in 2004.@@@@1@16@@danf@17-8-2009 10802120@unknown@formal@none@1@S@Older speech synthesis markup languages include Java Speech Markup Language ([[JSML]]) and [[SABLE]].@@@@1@13@@danf@17-8-2009 10802130@unknown@formal@none@1@S@Although each of these was proposed as a standard, none of them has been widely adopted.@@@@1@16@@danf@17-8-2009 10802140@unknown@formal@none@1@S@Speech synthesis markup languages are distinguished from dialogue markup languages.@@@@1@10@@danf@17-8-2009 10802150@unknown@formal@none@1@S@[[VoiceXML]], for example, includes tags related to speech recognition, dialogue management and touchtone dialing, in addition to text-to-speech markup.@@@@1@19@@danf@17-8-2009 10802160@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10802170@unknown@formal@none@1@S@===Accessibility===@@@@1@1@@danf@17-8-2009 10802180@unknown@formal@none@1@S@Speech synthesis has long been a vital [[assistive technology]] tool and its application in this area is significant and widespread.@@@@1@20@@danf@17-8-2009 10802190@unknown@formal@none@1@S@It allows environmental barriers to be removed for people with a wide range of disabilities.@@@@1@15@@danf@17-8-2009 10802200@unknown@formal@none@1@S@The longest application has been in the use of [[screenreaders]] for people with [[visual impairment]], but text-to-speech systems are now commonly used by people with [[dyslexia]] and other reading difficulties as well as by pre-literate youngsters.@@@@1@36@@danf@17-8-2009 10802210@unknown@formal@none@1@S@They are also frequently employed to aid those with severe [[speech impairment]] usually through a dedicated [[voice output communication aid]].@@@@1@20@@danf@17-8-2009 10802220@unknown@formal@none@1@S@===News service===@@@@1@2@@danf@17-8-2009 10802230@unknown@formal@none@1@S@Sites such as [[Ananova]] have used speech synthesis to convert written news to audio content, which can be used for mobile applications.@@@@1@22@@danf@17-8-2009 10802240@unknown@formal@none@1@S@===Entertainment===@@@@1@1@@danf@17-8-2009 10802250@unknown@formal@none@1@S@Speech synthesis techniques are used as well in the entertainment productions such as games, anime and similar.@@@@1@17@@danf@17-8-2009 10802260@unknown@formal@none@1@S@In 2007, Animo Limited announced the development of a software application package based on its speech synthesis software FineSpeech, explicitly geared towards customers in the entertainment industries, able to generate narration and lines of dialogue according to user specifications.@@@@1@39@@danf@17-8-2009 10802270@unknown@formal@none@1@S@Software such as [[Vocaloid]] can generate singing voices via lyrics and melody.@@@@1@12@@danf@17-8-2009 10802280@unknown@formal@none@1@S@This is also the aim of the Singing Computer project (which uses the [[GNU General Public License|GPL]] software [[GNU LilyPond|Lilypond]] and [[Festival Speech Synthesis System|Festival]]) to help blind people check their lyric input.@@@@1@33@@danf@17-8-2009 10810010@unknown@formal@none@1@S@
Statistical classification
@@@@1@2@@danf@17-8-2009 10810020@unknown@formal@none@1@S@'''Statistical classification''' is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a [[training set]] of previously labeled items.@@@@1@43@@danf@17-8-2009 10810030@unknown@formal@none@1@S@Formally, the problem can be stated as follows: given training data \\{(\\mathbf{x_1},y_1),\\dots,(\\mathbf{x_n}, y_n)\\} produce a classifier h:\\mathcal{X}\\rightarrow\\mathcal{Y} which maps an object \\mathbf{x} \\in \\mathcal{X} to its classification label y \\in \\mathcal{Y}.@@@@1@31@@danf@17-8-2009 10810040@unknown@formal@none@1@S@For example, if the problem is filtering spam, then \\mathbf{x_i} is some representation of an email and y is either "Spam" or "Non-Spam".@@@@1@23@@danf@17-8-2009 10810050@unknown@formal@none@1@S@Statistical classification algorithms are typically used in [[pattern recognition]] systems.@@@@1@10@@danf@17-8-2009 10810060@unknown@formal@none@1@S@'''Note:''' in [[community ecology]], the term "classification" is synonymous with what is commonly known (in [[machine learning]]) as [[data clustering|clustering]].@@@@1@20@@danf@17-8-2009 10810070@unknown@formal@none@1@S@See that article for more information about purely [[unsupervised learning|unsupervised]] techniques.@@@@1@11@@danf@17-8-2009 10810080@unknown@formal@none@1@S@* The second problem is to consider classification as an [[estimation]] problem, where the goal is to estimate a function of the form@@@@1@23@@danf@17-8-2009 10810090@unknown@formal@none@1@S@:P({\\rm class}|{\\vec x}) = f\\left(\\vec x;\\vec \\theta\\right) where the feature vector input is \\vec x, and the function f is typically parameterized by some parameters \\vec \\theta.@@@@1@27@@danf@17-8-2009 10810100@unknown@formal@none@1@S@In the [[Bayesian statistics|Bayesian]] approach to this problem, instead of choosing a single parameter vector \\vec \\theta, the result is integrated over all possible thetas, with the thetas weighted by how likely they are given the training data D:@@@@1@39@@danf@17-8-2009 10810110@unknown@formal@none@1@S@:P({\\rm class}|{\\vec x}) = \\int f\\left(\\vec x;\\vec \\theta\\right)P(\\vec \\theta|D) d\\vec \\theta@@@@1@11@@danf@17-8-2009 10810120@unknown@formal@none@1@S@* The third problem is related to the second, but the problem is to estimate the [[conditional probability|class-conditional probabilities]] P(\\vec x|{\\rm class}) and then use [[Bayes' rule]] to produce the class probability as in the second problem.@@@@1@37@@danf@17-8-2009 10810130@unknown@formal@none@1@S@Examples of classification algorithms include:@@@@1@5@@danf@17-8-2009 10810140@unknown@formal@none@1@S@* [[Linear classifier]]s@@@@1@3@@danf@17-8-2009 10810150@unknown@formal@none@1@S@** [[Fisher's linear discriminant]]@@@@1@4@@danf@17-8-2009 10810160@unknown@formal@none@1@S@** [[Logistic regression]]@@@@1@3@@danf@17-8-2009 10810170@unknown@formal@none@1@S@** [[Naive Bayes classifier]]@@@@1@4@@danf@17-8-2009 10810180@unknown@formal@none@1@S@** [[Perceptron]]@@@@1@2@@danf@17-8-2009 10810190@unknown@formal@none@1@S@** [[Support vector machine]]s@@@@1@4@@danf@17-8-2009 10810200@unknown@formal@none@1@S@* [[Quadratic classifier]]s@@@@1@3@@danf@17-8-2009 10810210@unknown@formal@none@1@S@* [[Nearest_neighbor_(pattern_recognition)|k-nearest neighbor]]@@@@1@3@@danf@17-8-2009 10810220@unknown@formal@none@1@S@* [[Boosting]]@@@@1@2@@danf@17-8-2009 10810230@unknown@formal@none@1@S@* [[Decision tree]]s@@@@1@3@@danf@17-8-2009 10810240@unknown@formal@none@1@S@** [[Random forest]]s@@@@1@3@@danf@17-8-2009 10810250@unknown@formal@none@1@S@* [[Artificial neural networks|Neural network]]s@@@@1@5@@danf@17-8-2009 10810260@unknown@formal@none@1@S@* [[Bayesian network]]s@@@@1@3@@danf@17-8-2009 10810270@unknown@formal@none@1@S@* [[Hidden Markov model]]s@@@@1@4@@danf@17-8-2009 10810280@unknown@formal@none@1@S@An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers).@@@@1@32@@danf@17-8-2009 10810290@unknown@formal@none@1@S@Van der Walt and Barnard (see reference section) investigated very specific artificial data sets to determine conditions under which certain classifiers perform better and worse than others.@@@@1@27@@danf@17-8-2009 10810300@unknown@formal@none@1@S@Classifier performance depends greatly on the characteristics of the data to be classified.@@@@1@13@@danf@17-8-2009 10810310@unknown@formal@none@1@S@There is no single classifier that works best on all given problems (a phenomenon that may be explained by the [[No free lunch in search and optimization|No-free-lunch theorem]]).@@@@1@28@@danf@17-8-2009 10810320@unknown@formal@none@1@S@Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance.@@@@1@21@@danf@17-8-2009 10810330@unknown@formal@none@1@S@Determining a suitable classifier for a given problem is however still more an art than a science.@@@@1@17@@danf@17-8-2009 10810340@unknown@formal@none@1@S@The most widely used classifiers are the [[Neural Network]] (Multi-layer Perceptron), [[Support Vector Machines]], [[KNN|k-Nearest Neighbours]], Gaussian Mixture Model, Gaussian, [[Naive Bayes]], [[Decision Tree]] and [[Radial Basis Function|RBF]] classifiers.@@@@1@29@@danf@17-8-2009 10810350@unknown@formal@none@1@S@== Evaluation ==@@@@1@3@@danf@17-8-2009 10810360@unknown@formal@none@1@S@The measures [[Precision and Recall]] are popular metrics used to evaluate the quality of a classification system.@@@@1@17@@danf@17-8-2009 10810370@unknown@formal@none@1@S@More recently, [[Receiver Operating Characteristic]] (ROC) curves have been used to evaluate the tradeoff between true- and false-positive rates of classification algorithms.@@@@1@22@@danf@17-8-2009 10810380@unknown@formal@none@1@S@==Application domains==@@@@1@2@@danf@17-8-2009 10810390@unknown@formal@none@1@S@* [[Computer vision]]@@@@1@3@@danf@17-8-2009 10810400@unknown@formal@none@1@S@** [[Medical Imaging]] and Medical Image Analysis@@@@1@7@@danf@17-8-2009 10810410@unknown@formal@none@1@S@** [[Optical character recognition]]@@@@1@4@@danf@17-8-2009 10810420@unknown@formal@none@1@S@* [[Geostatistics]]@@@@1@2@@danf@17-8-2009 10810430@unknown@formal@none@1@S@* [[Speech recognition]]@@@@1@3@@danf@17-8-2009 10810440@unknown@formal@none@1@S@* [[Handwriting recognition]]@@@@1@3@@danf@17-8-2009 10810450@unknown@formal@none@1@S@* [[Biometric]] identification@@@@1@3@@danf@17-8-2009 10810460@unknown@formal@none@1@S@* [[Natural language processing]]@@@@1@4@@danf@17-8-2009 10810470@unknown@formal@none@1@S@* [[Document classification]]@@@@1@3@@danf@17-8-2009 10810480@unknown@formal@none@1@S@* Internet [[search engines]]@@@@1@4@@danf@17-8-2009 10810490@unknown@formal@none@1@S@* [[Credit scoring]]@@@@1@3@@danf@17-8-2009 10820010@unknown@formal@none@1@S@
Statistical machine translation
@@@@1@3@@danf@17-8-2009 10820020@unknown@formal@none@1@S@'''Statistical machine translation''' ('''SMT''') is a [[machine translation]] paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual [[text corpora]].@@@@1@30@@danf@17-8-2009 10820030@unknown@formal@none@1@S@The statistical approach contrasts with the rule-based approaches to [[machine translation]] as well as with [[example-based machine translation]].@@@@1@18@@danf@17-8-2009 10820040@unknown@formal@none@1@S@The first ideas of statistical machine translation were introduced by [[Warren Weaver]] in 1949, including the ideas of applying [[Claude Shannon]]'s [[information theory]].@@@@1@23@@danf@17-8-2009 10820050@unknown@formal@none@1@S@Statistical machine translation was re-introduced in 1991 by researchers at [[IBM]]'s [[Thomas J. Watson Research Center]] and has contributed to the significant resurgence in interest in machine translation in recent years.@@@@1@31@@danf@17-8-2009 10820060@unknown@formal@none@1@S@As of 2006, it is by far the most widely-studied machine translation paradigm.@@@@1@13@@danf@17-8-2009 10820070@unknown@formal@none@1@S@==Benefits==@@@@1@1@@danf@17-8-2009 10820080@unknown@formal@none@1@S@The benefits of statistical machine translation over traditional paradigms that are most often cited are the following:@@@@1@17@@danf@17-8-2009 10820090@unknown@formal@none@1@S@* '''Better use of resources'''@@@@1@5@@danf@17-8-2009 10820100@unknown@formal@none@1@S@**There is a great deal of natural language in machine-readable format.@@@@1@11@@danf@17-8-2009 10820110@unknown@formal@none@1@S@**Generally, SMT systems are not tailored to any specific pair of languages.@@@@1@12@@danf@17-8-2009 10820120@unknown@formal@none@1@S@**Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages.@@@@1@23@@danf@17-8-2009 10820130@unknown@formal@none@1@S@* '''More natural translations'''@@@@1@4@@danf@17-8-2009 10820140@unknown@formal@none@1@S@The ideas behind statistical machine translation come out of [[information theory]].@@@@1@11@@danf@17-8-2009 10820150@unknown@formal@none@1@S@Essentially, the document is translated on the [[probability]] p(e|f) that a string e in native language (for example, English) is the translation of a string f in foreign language (for example, French).@@@@1@32@@danf@17-8-2009 10820160@unknown@formal@none@1@S@Generally, these probabilities are estimated using techniques of [[parameter estimation]].@@@@1@10@@danf@17-8-2009 10820170@unknown@formal@none@1@S@The [[Bayes Theorem]] is applied to p(e|f), the probability that the foreign string produces the native string to get p(e|f) \\propto p(f|e) p(e), where the [[translation model]] p(f|e) is the probability that the native string is the translation of the foreign string, and the [[language model]] p(e) is the probability of seeing that native string.@@@@1@55@@danf@17-8-2009 10820180@unknown@formal@none@1@S@Mathematically speaking, finding the best translation \\tilde{e} is done by picking up the one that gives the highest probability:@@@@1@19@@danf@17-8-2009 10820190@unknown@formal@none@1@S@: \\tilde{e} = arg \\max_{e \\in e^*} p(e|f) = arg \\max_{e\\in e^*} p(f|e) p(e) .@@@@1@15@@danf@17-8-2009 10820200@unknown@formal@none@1@S@For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e^* in the native language.@@@@1@24@@danf@17-8-2009 10820210@unknown@formal@none@1@S@Performing the search efficiently is the work of a [[machine translation decoder]] that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality.@@@@1@34@@danf@17-8-2009 10820220@unknown@formal@none@1@S@This trade-off between quality and time usage can also be found in [[speech recognition]].@@@@1@14@@danf@17-8-2009 10820230@unknown@formal@none@1@S@As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough.@@@@1@29@@danf@17-8-2009 10820240@unknown@formal@none@1@S@Language models are typically approximated by smoothed ''n''-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.@@@@1@34@@danf@17-8-2009 10820250@unknown@formal@none@1@S@The statistical translation models were initially [[word]] based (Models 1-5 from [[IBM]]), but significant advances were made with the introduction of [[phrase]] based models.@@@@1@24@@danf@17-8-2009 10820260@unknown@formal@none@1@S@Recent work has incorporated [[syntax]] or quasi-syntactic structures.@@@@1@8@@danf@17-8-2009 10820270@unknown@formal@none@1@S@==Word-based translation==@@@@1@2@@danf@17-8-2009 10820280@unknown@formal@none@1@S@In word-based translation, translated elements are words.@@@@1@7@@danf@17-8-2009 10820290@unknown@formal@none@1@S@Typically, the number of words in translated sentences are different due to compound words, morphology and idioms.@@@@1@17@@danf@17-8-2009 10820300@unknown@formal@none@1@S@The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces.@@@@1@23@@danf@17-8-2009 10820310@unknown@formal@none@1@S@Simple word-based translation is not able to translate language pairs with fertility rates different from one.@@@@1@16@@danf@17-8-2009 10820320@unknown@formal@none@1@S@To make word-based translation systems manage, for instance, high fertility rates, the system could be able to map a single word to multiple words, but not vice versa.@@@@1@28@@danf@17-8-2009 10820330@unknown@formal@none@1@S@For instance, if we are translating from French to English, each word in English could produce zero or more French words.@@@@1@21@@danf@17-8-2009 10820340@unknown@formal@none@1@S@But there's no way to group two English words producing a single French word.@@@@1@14@@danf@17-8-2009 10820350@unknown@formal@none@1@S@An example of a word-based translation system is the freely available [[GIZA++]] package ([[GPL]]ed), which includes [[IBM]] models.@@@@1@18@@danf@17-8-2009 10820360@unknown@formal@none@1@S@==Phrase-based translation==@@@@1@2@@danf@17-8-2009 10820370@unknown@formal@none@1@S@In phrase-based translation, the restrictions produced by word-based translation have been tried to reduce by translating sequences of words to sequences of words, where the lengths can differ.@@@@1@28@@danf@17-8-2009 10820380@unknown@formal@none@1@S@The sequences of words are called, for instance, blocks or phrases, but typically are not linguistic [[phrase]]s but phrases found using statistical methods from the corpus.@@@@1@26@@danf@17-8-2009 10820390@unknown@formal@none@1@S@Restricting the phrases to linguistic phrases has been shown to decrease translation quality.@@@@1@13@@danf@17-8-2009 10820400@unknown@formal@none@1@S@==Syntax-based translation==@@@@1@2@@danf@17-8-2009 10820410@unknown@formal@none@1@S@==Challenges with statistical machine translation==@@@@1@5@@danf@17-8-2009 10820420@unknown@formal@none@1@S@Problems that statistical machine translation have to deal with include@@@@1@10@@danf@17-8-2009 10820430@unknown@formal@none@1@S@=== Compound words ===@@@@1@4@@danf@17-8-2009 10820440@unknown@formal@none@1@S@=== Idioms ===@@@@1@3@@danf@17-8-2009 10820450@unknown@formal@none@1@S@=== Morphology ===@@@@1@3@@danf@17-8-2009 10820460@unknown@formal@none@1@S@=== Different word orders ===@@@@1@5@@danf@17-8-2009 10820470@unknown@formal@none@1@S@Word order in languages differ.@@@@1@5@@danf@17-8-2009 10820480@unknown@formal@none@1@S@Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages.@@@@1@32@@danf@17-8-2009 10820490@unknown@formal@none@1@S@There are also additional differences in word orders, for instance, where modifiers for nouns are located.@@@@1@16@@danf@17-8-2009 10820500@unknown@formal@none@1@S@In [[Speech Recognition]], the speech signal and the corresponding textual representation can be mapped to each other in blocks in order.@@@@1@21@@danf@17-8-2009 10820510@unknown@formal@none@1@S@This is not always the case with the same text in two languages.@@@@1@13@@danf@17-8-2009 10820520@unknown@formal@none@1@S@For SMT, the translation model is only able to translate small sequences of words and word order has to be taken into account somehow.@@@@1@24@@danf@17-8-2009 10820530@unknown@formal@none@1@S@Typical solution has been re-ordering models, where a distribution of location changes for each item of translation is approximated from aligned bi-text.@@@@1@22@@danf@17-8-2009 10820540@unknown@formal@none@1@S@Different location changes can be ranked with the help of the language model and the best can be selected.@@@@1@19@@danf@17-8-2009 10820550@unknown@formal@none@1@S@=== Syntax ===@@@@1@3@@danf@17-8-2009 10820560@unknown@formal@none@1@S@=== Out of vocabulary (OOV) words ===@@@@1@7@@danf@17-8-2009 10820570@unknown@formal@none@1@S@SMT systems store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated.@@@@1@30@@danf@17-8-2009 10820580@unknown@formal@none@1@S@Main reasons for out of vocabulary words are the limitation of training data, domain changes and morphology.@@@@1@17@@danf@17-8-2009 10830010@unknown@formal@none@1@S@
Statistics
@@@@1@1@@danf@17-8-2009 10830020@unknown@formal@none@1@S@'''Statistics''' is a [[Mathematics|mathematical science]] pertaining to the collection, analysis, interpretation or explanation, and presentation of [[data]].@@@@1@17@@danf@17-8-2009 10830030@unknown@formal@none@1@S@It is applicable to a wide variety of [[academic discipline]]s, from the [[Natural science|natural]] and [[social science]]s to the [[humanities]], government and business.@@@@1@23@@danf@17-8-2009 10830040@unknown@formal@none@1@S@Statistical methods can be used to summarize or describe a collection of data; this is called '''[[descriptive statistics]]'''.@@@@1@18@@danf@17-8-2009 10830050@unknown@formal@none@1@S@In addition, patterns in the data may be [[mathematical model|modeled]] in a way that accounts for [[random]]ness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called '''[[inferential statistics]]'''.@@@@1@40@@danf@17-8-2009 10830060@unknown@formal@none@1@S@Both descriptive and inferential statistics comprise '''applied statistics'''.@@@@1@8@@danf@17-8-2009 10830070@unknown@formal@none@1@S@There is also a discipline called '''[[mathematical statistics]]''', which is concerned with the theoretical basis of the subject.@@@@1@18@@danf@17-8-2009 10830080@unknown@formal@none@1@S@The word '''''statistics''''' is also the plural of '''''[[statistic]]''''' (singular), which refers to the result of applying a statistical algorithm to a set of data, as in [[economic statistics]], [[crime statistics]], etc.@@@@1@32@@danf@17-8-2009 10830090@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10830100@unknown@formal@none@1@S@:@@@@1@1@@danf@17-8-2009 10830110@unknown@formal@none@1@S@''"Five men, [[Hermann Conring|Conring]],[[Gottfried Achenwall| Achenwall]], [[Johann Peter Süssmilch|Süssmilch]], [[John Graunt|Graunt]] and [[William Petty|Petty]] have been honored by different writers as the founder of statistics."'' claims one source (Willcox, Walter (1938) ''The Founder of Statistics''.@@@@1@35@@danf@17-8-2009 10830120@unknown@formal@none@1@S@Review of the [[International Statistical Institute]] 5(4):321-328.)@@@@1@7@@danf@17-8-2009 10830130@unknown@formal@none@1@S@Some scholars pinpoint the origin of statistics to 1662, with the publication of "[[Observations on the Bills of Mortality]]" by John Graunt.@@@@1@22@@danf@17-8-2009 10830140@unknown@formal@none@1@S@Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data.@@@@1@19@@danf@17-8-2009 10830150@unknown@formal@none@1@S@The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general.@@@@1@23@@danf@17-8-2009 10830160@unknown@formal@none@1@S@Today, statistics is widely employed in government, business, and the natural and social sciences.@@@@1@14@@danf@17-8-2009 10830170@unknown@formal@none@1@S@Because of its empirical roots and its applications, statistics is generally considered not to be a subfield of pure mathematics, but rather a distinct branch of applied mathematics.@@@@1@28@@danf@17-8-2009 10830180@unknown@formal@none@1@S@Its mathematical foundations were laid in the 17th century with the development of [[probability theory]] by [[Pascal]] and [[Fermat]].@@@@1@19@@danf@17-8-2009 10830190@unknown@formal@none@1@S@Probability theory arose from the study of games of chance.@@@@1@10@@danf@17-8-2009 10830200@unknown@formal@none@1@S@The [[method of least squares]] was first described by [[Carl Friedrich Gauss]] around 1794.@@@@1@14@@danf@17-8-2009 10830210@unknown@formal@none@1@S@The use of modern [[computer]]s has expedited large-scale statistical computation, and has also made possible new methods that are impractical to perform manually.@@@@1@23@@danf@17-8-2009 10830220@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10830230@unknown@formal@none@1@S@In applying statistics to a scientific, industrial, or societal problem, one begins with a process or [[statistical population|population]] to be studied.@@@@1@21@@danf@17-8-2009 10830240@unknown@formal@none@1@S@This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period.@@@@1@28@@danf@17-8-2009 10830250@unknown@formal@none@1@S@It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a [[time series]].@@@@1@24@@danf@17-8-2009 10830260@unknown@formal@none@1@S@For practical reasons, rather than compiling data about an entire population, one usually studies a chosen subset of the population, called a [[sampling (statistics)|sample]].@@@@1@24@@danf@17-8-2009 10830270@unknown@formal@none@1@S@Data are collected about the sample in an observational or [[experiment]]al setting.@@@@1@12@@danf@17-8-2009 10830280@unknown@formal@none@1@S@The data are then subjected to statistical analysis, which serves two related purposes: description and inference.@@@@1@16@@danf@17-8-2009 10830290@unknown@formal@none@1@S@*[[Descriptive statistics]] can be used to summarize the data, either numerically or graphically, to describe the sample.@@@@1@17@@danf@17-8-2009 10830300@unknown@formal@none@1@S@Basic examples of numerical descriptors include the [[mean]] and [[standard deviation]].@@@@1@11@@danf@17-8-2009 10830310@unknown@formal@none@1@S@Graphical summarizations include various kinds of charts and graphs.@@@@1@9@@danf@17-8-2009 10830320@unknown@formal@none@1@S@*[[Inferential statistics]] is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population.@@@@1@20@@danf@17-8-2009 10830330@unknown@formal@none@1@S@These inferences may take the form of answers to yes/no questions ([[hypothesis testing]]), estimates of numerical characteristics ([[estimation]]), descriptions of association ([[correlation]]), or modeling of relationships ([[regression analysis|regression]]).@@@@1@28@@danf@17-8-2009 10830340@unknown@formal@none@1@S@Other [[mathematical model|modeling]] techniques include [[ANOVA]], [[time series]], and [[data mining]].@@@@1@11@@danf@17-8-2009 10830350@unknown@formal@none@1@S@The concept of correlation is particularly noteworthy.@@@@1@7@@danf@17-8-2009 10830360@unknown@formal@none@1@S@Statistical analysis of a [[data set]] may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected.@@@@1@29@@danf@17-8-2009 10830370@unknown@formal@none@1@S@For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people.@@@@1@26@@danf@17-8-2009 10830380@unknown@formal@none@1@S@The two variables are said to be correlated (which is a positive correlation in this case).@@@@1@16@@danf@17-8-2009 10830390@unknown@formal@none@1@S@However, one cannot immediately infer the existence of a causal relationship between the two variables.@@@@1@15@@danf@17-8-2009 10830400@unknown@formal@none@1@S@(See [[Correlation does not imply causation]].)@@@@1@6@@danf@17-8-2009 10830410@unknown@formal@none@1@S@The correlated phenomena could be caused by a third, previously unconsidered phenomenon, called a [[lurking variable]] or [[confounding variable]].@@@@1@19@@danf@17-8-2009 10830420@unknown@formal@none@1@S@If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole.@@@@1@25@@danf@17-8-2009 10830430@unknown@formal@none@1@S@A major problem lies in determining the extent to which the chosen sample is representative.@@@@1@15@@danf@17-8-2009 10830440@unknown@formal@none@1@S@Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place.@@@@1@30@@danf@17-8-2009 10830450@unknown@formal@none@1@S@(See [[experimental design]].)@@@@1@3@@danf@17-8-2009 10830460@unknown@formal@none@1@S@The fundamental mathematical concept employed in understanding such randomness is [[probability]].@@@@1@11@@danf@17-8-2009 10830470@unknown@formal@none@1@S@[[Mathematical statistics]] (also called [[statistical theory]]) is the branch of [[applied mathematics]] that uses probability theory and [[mathematical analysis|analysis]] to examine the theoretical basis of statistics.@@@@1@26@@danf@17-8-2009 10830480@unknown@formal@none@1@S@The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method.@@@@1@24@@danf@17-8-2009 10830490@unknown@formal@none@1@S@[[Misuse of statistics]] can produce subtle but serious errors in description and interpretation — subtle in the sense that even experienced professionals sometimes make such errors, serious in the sense that they may affect, for instance, social policy, medical practice and the reliability of structures such as bridges.@@@@1@48@@danf@17-8-2009 10830500@unknown@formal@none@1@S@Even when statistics is correctly applied, the results can be difficult for the non-expert to interpret.@@@@1@16@@danf@17-8-2009 10830510@unknown@formal@none@1@S@For example, the [[statistical significance]] of a trend in the data, which measures the extent to which the trend could be caused by random variation in the sample, may not agree with one's intuitive sense of its significance.@@@@1@38@@danf@17-8-2009 10830520@unknown@formal@none@1@S@The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as [[statistical literacy]].@@@@1@25@@danf@17-8-2009 10830530@unknown@formal@none@1@S@==Statistical methods==@@@@1@2@@danf@17-8-2009 10830540@unknown@formal@none@1@S@===Experimental and observational studies===@@@@1@4@@danf@17-8-2009 10830550@unknown@formal@none@1@S@A common goal for a statistical research project is to investigate [[causality]], and in particular to draw a conclusion on the effect of changes in the values of predictors or [[independent variable]]s on response or [[dependent variable]]s.@@@@1@37@@danf@17-8-2009 10830560@unknown@formal@none@1@S@There are two major types of causal statistical studies, experimental studies and observational studies.@@@@1@14@@danf@17-8-2009 10830570@unknown@formal@none@1@S@In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed.@@@@1@24@@danf@17-8-2009 10830580@unknown@formal@none@1@S@The difference between the two types lies in how the study is actually conducted.@@@@1@14@@danf@17-8-2009 10830590@unknown@formal@none@1@S@Each can be very effective.@@@@1@5@@danf@17-8-2009 10830600@unknown@formal@none@1@S@An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements.@@@@1@35@@danf@17-8-2009 10830610@unknown@formal@none@1@S@In contrast, an observational study does not involve experimental manipulation.@@@@1@10@@danf@17-8-2009 10830620@unknown@formal@none@1@S@Instead, data are gathered and correlations between predictors and response are investigated.@@@@1@12@@danf@17-8-2009 10830630@unknown@formal@none@1@S@An example of an experimental study is the famous [[Hawthorne studies]], which attempted to test the changes to the working environment at the Hawthorne plant of the Western Electric Company.@@@@1@30@@danf@17-8-2009 10830640@unknown@formal@none@1@S@The researchers were interested in determining whether increased illumination would increase the productivity of the [[assembly line]] workers.@@@@1@18@@danf@17-8-2009 10830650@unknown@formal@none@1@S@The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected the productivity.@@@@1@29@@danf@17-8-2009 10830660@unknown@formal@none@1@S@It turned out that the productivity indeed improved (under the experimental conditions).@@@@1@12@@danf@17-8-2009 10830663@unknown@formal@none@1@S@(See [[Hawthorne effect]].)@@@@1@3@@danf@17-8-2009 10830665@unknown@formal@none@1@S@However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a [[control group]] and [[double-blind|blindedness]].@@@@1@22@@danf@17-8-2009 10830670@unknown@formal@none@1@S@An example of an observational study is a study which explores the correlation between smoking and lung cancer.@@@@1@18@@danf@17-8-2009 10830680@unknown@formal@none@1@S@This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis.@@@@1@21@@danf@17-8-2009 10830690@unknown@formal@none@1@S@In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a [[case-control study]], and then look for the number of cases of lung cancer in each group.@@@@1@32@@danf@17-8-2009 10830700@unknown@formal@none@1@S@The basic steps of an experiment are;@@@@1@7@@danf@17-8-2009 10830710@unknown@formal@none@1@S@# Planning the research, including determining information sources, research subject selection, and [[ethics|ethical]] considerations for the proposed research and method.@@@@1@20@@danf@17-8-2009 10830720@unknown@formal@none@1@S@# [[Design of experiments]], concentrating on the system model and the interaction of independent and dependent variables.@@@@1@17@@danf@17-8-2009 10830730@unknown@formal@none@1@S@# [[summary statistics|Summarizing a collection of observations]] to feature their commonality by suppressing details.@@@@1@14@@danf@17-8-2009 10830740@unknown@formal@none@1@S@([[Descriptive statistics]])@@@@1@2@@danf@17-8-2009 10830750@unknown@formal@none@1@S@# Reaching consensus about what [[statistical inference|the observations tell]] about the world being observed.@@@@1@14@@danf@17-8-2009 10830760@unknown@formal@none@1@S@([[Statistical inference]])@@@@1@2@@danf@17-8-2009 10830770@unknown@formal@none@1@S@# Documenting / presenting the results of the study.@@@@1@9@@danf@17-8-2009 10830780@unknown@formal@none@1@S@===Levels of measurement===@@@@1@3@@danf@17-8-2009 10830790@unknown@formal@none@1@S@:''See: [[Levels of measurement|Stanley Stevens' "Scales of measurement" (1946): nominal, ordinal, interval, ratio]]''@@@@1@13@@danf@17-8-2009 10830800@unknown@formal@none@1@S@There are four types of measurements or [[level of measurement|levels of measurement]] or measurement scales used in statistics: nominal, ordinal, interval, and ratio.@@@@1@23@@danf@17-8-2009 10830810@unknown@formal@none@1@S@They have different degrees of usefulness in statistical [[research]].@@@@1@9@@danf@17-8-2009 10830820@unknown@formal@none@1@S@Ratio measurements have both a zero value defined and the distances between different measurements defined; they provide the greatest flexibility in statistical methods that can be used for analyzing the data.@@@@1@31@@danf@17-8-2009 10830830@unknown@formal@none@1@S@Interval measurements have meaningful distances between measurements defined, but have no meaningful zero value defined (as in the case with IQ measurements or with temperature measurements in [[Fahrenheit]]).@@@@1@28@@danf@17-8-2009 10830840@unknown@formal@none@1@S@Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values.@@@@1@16@@danf@17-8-2009 10830850@unknown@formal@none@1@S@Nominal measurements have no meaningful rank order among values.@@@@1@9@@danf@17-8-2009 10830860@unknown@formal@none@1@S@Since variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are called together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative or [[continuous variables]] due to their numerical nature.@@@@1@40@@danf@17-8-2009 10830870@unknown@formal@none@1@S@===Statistical techniques===@@@@1@2@@danf@17-8-2009 10830880@unknown@formal@none@1@S@Some well known statistical [[Statistical hypothesis testing|test]]s and [[procedure]]s for [[research]] [[observation]]s are:@@@@1@13@@danf@17-8-2009 10830890@unknown@formal@none@1@S@* [[Student's t-test]]@@@@1@3@@danf@17-8-2009 10830900@unknown@formal@none@1@S@* [[chi-square test]]@@@@1@3@@danf@17-8-2009 10830910@unknown@formal@none@1@S@* [[Analysis of variance]] (ANOVA)@@@@1@5@@danf@17-8-2009 10830920@unknown@formal@none@1@S@* [[Mann-Whitney U]]@@@@1@3@@danf@17-8-2009 10830930@unknown@formal@none@1@S@* [[Regression analysis]]@@@@1@3@@danf@17-8-2009 10830940@unknown@formal@none@1@S@* [[Factor Analysis]]@@@@1@3@@danf@17-8-2009 10830950@unknown@formal@none@1@S@* [[Correlation]]@@@@1@2@@danf@17-8-2009 10830960@unknown@formal@none@1@S@* [[Pearson product-moment correlation coefficient]]@@@@1@5@@danf@17-8-2009 10830970@unknown@formal@none@1@S@* [[Spearman's rank correlation coefficient]]@@@@1@5@@danf@17-8-2009 10830980@unknown@formal@none@1@S@* [[Time Series Analysis]]@@@@1@4@@danf@17-8-2009 10830990@unknown@formal@none@1@S@==Specialized disciplines==@@@@1@2@@danf@17-8-2009 10831000@unknown@formal@none@1@S@Some fields of inquiry use applied statistics so extensively that they have [[specialized terminology]].@@@@1@14@@danf@17-8-2009 10831010@unknown@formal@none@1@S@These disciplines include:@@@@1@3@@danf@17-8-2009 10831020@unknown@formal@none@1@S@* [[Actuarial science]]@@@@1@3@@danf@17-8-2009 10831030@unknown@formal@none@1@S@* [[Applied information economics]]@@@@1@4@@danf@17-8-2009 10831040@unknown@formal@none@1@S@* [[Biostatistics]]@@@@1@2@@danf@17-8-2009 10831050@unknown@formal@none@1@S@* [[Bootstrapping (statistics)|Bootstrap]] & [[Resampling (statistics)|Jackknife Resampling]]@@@@1@7@@danf@17-8-2009 10831060@unknown@formal@none@1@S@* [[Business statistics]]@@@@1@3@@danf@17-8-2009 10831070@unknown@formal@none@1@S@* [[Data analysis]]@@@@1@3@@danf@17-8-2009 10831080@unknown@formal@none@1@S@* [[Data mining]] (applying statistics and [[pattern recognition]] to discover knowledge from data)@@@@1@13@@danf@17-8-2009 10831090@unknown@formal@none@1@S@* [[Demography]]@@@@1@2@@danf@17-8-2009 10831100@unknown@formal@none@1@S@* [[Economic statistics]] (Econometrics)@@@@1@4@@danf@17-8-2009 10831110@unknown@formal@none@1@S@* [[Energy statistics]]@@@@1@3@@danf@17-8-2009 10831120@unknown@formal@none@1@S@* [[Engineering statistics]]@@@@1@3@@danf@17-8-2009 10831130@unknown@formal@none@1@S@* [[Environmental Statistics]]@@@@1@3@@danf@17-8-2009 10831140@unknown@formal@none@1@S@* [[Epidemiology]]@@@@1@2@@danf@17-8-2009 10831150@unknown@formal@none@1@S@* [[Geography]] and [[Geographic Information Systems]], more specifically in [[Spatial analysis]]@@@@1@11@@danf@17-8-2009 10831160@unknown@formal@none@1@S@* [[Image processing]]@@@@1@3@@danf@17-8-2009 10831170@unknown@formal@none@1@S@* [[Multivariate statistics|Multivariate Analysis]]@@@@1@4@@danf@17-8-2009 10831180@unknown@formal@none@1@S@* [[Psychological statistics]]@@@@1@3@@danf@17-8-2009 10831190@unknown@formal@none@1@S@* [[Quality]]@@@@1@2@@danf@17-8-2009 10831200@unknown@formal@none@1@S@* [[Social statistics]]@@@@1@3@@danf@17-8-2009 10831210@unknown@formal@none@1@S@* [[Statistical literacy]]@@@@1@3@@danf@17-8-2009 10831220@unknown@formal@none@1@S@* [[Statistical modeling]]@@@@1@3@@danf@17-8-2009 10831230@unknown@formal@none@1@S@* [[Statistical survey]]s@@@@1@3@@danf@17-8-2009 10831240@unknown@formal@none@1@S@* Process analysis and [[chemometrics]] (for analysis of data from [[analytical chemistry]] and [[chemical engineering]])@@@@1@15@@danf@17-8-2009 10831250@unknown@formal@none@1@S@* [[Structured data analysis (statistics)]]@@@@1@5@@danf@17-8-2009 10831260@unknown@formal@none@1@S@* [[Survival analysis]]@@@@1@3@@danf@17-8-2009 10831270@unknown@formal@none@1@S@* [[Reliability engineering]]@@@@1@3@@danf@17-8-2009 10831280@unknown@formal@none@1@S@* Statistics in various sports, particularly [[Baseball statistics|baseball]] and [[Cricket statistics|cricket]]@@@@1@11@@danf@17-8-2009 10831290@unknown@formal@none@1@S@Statistics form a key basis tool in business and manufacturing as well.@@@@1@12@@danf@17-8-2009 10831300@unknown@formal@none@1@S@It is used to understand measurement systems variability, control processes (as in [[statistical process control]] or SPC), for summarizing data, and to make data-driven decisions.@@@@1@25@@danf@17-8-2009 10831310@unknown@formal@none@1@S@In these roles, it is a key tool, and perhaps the only reliable tool.@@@@1@14@@danf@17-8-2009 10831320@unknown@formal@none@1@S@==Statistical computing==@@@@1@2@@danf@17-8-2009 10831330@unknown@formal@none@1@S@The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science.@@@@1@28@@danf@17-8-2009 10831340@unknown@formal@none@1@S@Early statistical models were almost always from the class of [[linear model]]s, but powerful computers, coupled with suitable numerical [[algorithms]], caused an increased interest in [[nonlinear regression|nonlinear models]] (especially [[neural networks]] and [[decision tree]]s) as well as the creation of new types, such as [[generalized linear model|generalised linear model]]s and [[multilevel model]]s.@@@@1@52@@danf@17-8-2009 10831350@unknown@formal@none@1@S@Increased computing power has also led to the growing popularity of computationally-intensive methods based on [[resampling (statistics)|resampling]], such as permutation tests and the [[bootstrapping (statistics)|bootstrap]], while techniques such as [[Gibbs sampling]] have made Bayesian methods more feasible.@@@@1@37@@danf@17-8-2009 10831360@unknown@formal@none@1@S@The computer revolution has implications for the future of statistics with new emphasis on "experimental" and "empirical" statistics.@@@@1@18@@danf@17-8-2009 10831370@unknown@formal@none@1@S@A large number of both general and special purpose [[List of statistical packages|statistical software]] are now available.@@@@1@17@@danf@17-8-2009 10831380@unknown@formal@none@1@S@== Misuse ==@@@@1@3@@danf@17-8-2009 10831390@unknown@formal@none@1@S@:@@@@1@1@@danf@17-8-2009 10831400@unknown@formal@none@1@S@There is a general perception that statistical knowledge is all-too-frequently intentionally [[Misuse of statistics|misused]] by finding ways to interpret only the data that are favorable to the presenter.@@@@1@28@@danf@17-8-2009 10831410@unknown@formal@none@1@S@A famous saying attributed to [[Benjamin Disraeli]] is, "[[Lies, damned lies, and statistics|There are three kinds of lies: lies, damned lies, and statistics]]"; and Harvard President [[Lawrence Lowell]] wrote in 1909 that statistics, ''"like veal pies, are good if you know the person that made them, and are sure of the ingredients"''.@@@@1@52@@danf@17-8-2009 10831420@unknown@formal@none@1@S@If various studies appear to contradict one another, then the public may come to distrust such studies.@@@@1@17@@danf@17-8-2009 10831430@unknown@formal@none@1@S@For example, one study may suggest that a given diet or activity raises [[blood pressure]], while another may suggest that it lowers blood pressure.@@@@1@24@@danf@17-8-2009 10831440@unknown@formal@none@1@S@The discrepancy can arise from subtle variations in experimental design, such as differences in the patient groups or research protocols, that are not easily understood by the non-expert.@@@@1@28@@danf@17-8-2009 10831450@unknown@formal@none@1@S@(Media reports sometimes omit this vital contextual information entirely.)@@@@1@9@@danf@17-8-2009 10831460@unknown@formal@none@1@S@By choosing (or rejecting, or modifying) a certain sample, results can be manipulated.@@@@1@13@@danf@17-8-2009 10831470@unknown@formal@none@1@S@Such manipulations need not be malicious or devious; they can arise from unintentional biases of the researcher.@@@@1@17@@danf@17-8-2009 10831480@unknown@formal@none@1@S@The graphs used to summarize data can also be misleading.@@@@1@10@@danf@17-8-2009 10831490@unknown@formal@none@1@S@Deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis (the [[null hypothesis]]) to be "favored", and can also seem to exaggerate the importance of minor differences in large studies.@@@@1@45@@danf@17-8-2009 10831500@unknown@formal@none@1@S@A difference that is highly statistically significant can still be of no practical significance.@@@@1@14@@danf@17-8-2009 10831510@unknown@formal@none@1@S@(See [[Hypothesis test#Criticism|criticism of hypothesis testing]] and [[Null hypothesis#Controversy|controversy over the null hypothesis]].)@@@@1@13@@danf@17-8-2009 10831520@unknown@formal@none@1@S@One response is by giving a greater emphasis on the [[p-value|''p''-value]] than simply reporting whether a hypothesis is rejected at the given level of significance.@@@@1@25@@danf@17-8-2009 10831530@unknown@formal@none@1@S@The ''p''-value, however, does not indicate the size of the effect.@@@@1@11@@danf@17-8-2009 10831540@unknown@formal@none@1@S@Another increasingly common approach is to report [[confidence interval]]s.@@@@1@9@@danf@17-8-2009 10831550@unknown@formal@none@1@S@Although these are produced from the same calculations as those of hypothesis tests or ''p''-values, they describe both the size of the effect and the uncertainty surrounding it.@@@@1@28@@danf@17-8-2009 10840010@unknown@formal@none@1@S@
Syntax
@@@@1@1@@danf@17-8-2009 10840020@unknown@formal@none@1@S@In [[linguistics]], '''syntax''' (from [[Ancient Greek]] {{lang|grc|συν-}} ''syn-'', "together", and {{lang|grc|τάξις}} ''táxis'', "arrangement") is the study of the principles and rules for constructing [[sentence]]s in [[natural language]]s.@@@@1@27@@danf@17-8-2009 10840030@unknown@formal@none@1@S@In addition to referring to the discipline, the term ''syntax'' is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the [[Irish syntax|syntax of Modern Irish]]".@@@@1@38@@danf@17-8-2009 10840040@unknown@formal@none@1@S@Modern research in syntax attempts to [[descriptive linguistics|describe languages]] in terms of such rules.@@@@1@14@@danf@17-8-2009 10840050@unknown@formal@none@1@S@Many professionals in this discipline attempt to find [[Universal Grammar|general rules]] that apply to all natural languages.@@@@1@17@@danf@17-8-2009 10840060@unknown@formal@none@1@S@The term ''syntax'' is also sometimes used to refer to the rules governing the behavior of mathematical systems, such as [[logic]], artificial formal languages, and computer programming languages.@@@@1@28@@danf@17-8-2009 10840070@unknown@formal@none@1@S@== Early history ==@@@@1@4@@danf@17-8-2009 10840080@unknown@formal@none@1@S@Works on grammar were being written long before modern syntax came about; the ''Aṣṭādhyāyī'' of [[Pāṇini]] is often cited as an example of a pre-modern work that approaches the sophistication of a modern syntactic theory.@@@@1@35@@danf@17-8-2009 10840090@unknown@formal@none@1@S@In the West, the school of thought that came to be known as "traditional grammar" began with the work of [[Dionysius Thrax]].@@@@1@22@@danf@17-8-2009 10840100@unknown@formal@none@1@S@For centuries, work in syntax was dominated by a framework known as {{lang|fr|''grammaire générale''}}, first expounded in 1660 by [[Antoine Arnauld]] in a book of the same title.@@@@1@28@@danf@17-8-2009 10840110@unknown@formal@none@1@S@This system took as its basic premise the assumption that language is a direct reflection of thought processes and therefore there is a single, most natural way to express a thought.@@@@1@31@@danf@17-8-2009 10840120@unknown@formal@none@1@S@That way, coincidentally, was exactly the way it was expressed in French.@@@@1@12@@danf@17-8-2009 10840130@unknown@formal@none@1@S@However, in the 19th century, with the development of [[historical-comparative linguistics]], linguists began to realize the sheer diversity of human language, and to question fundamental assumptions about the relationship between language and logic.@@@@1@33@@danf@17-8-2009 10840140@unknown@formal@none@1@S@It became apparent that there was no such thing as a most natural way to express a thought, and therefore logic could no longer be relied upon as a basis for studying the structure of language.@@@@1@36@@danf@17-8-2009 10840150@unknown@formal@none@1@S@The Port-Royal grammar modeled the study of syntax upon that of logic (indeed, large parts of the [[Port-Royal Logic]] were copied or adapted from the ''Grammaire générale'').@@@@1@27@@danf@17-8-2009 10840160@unknown@formal@none@1@S@Syntactic categories were identified with logical ones, and all sentences were analyzed in terms of "Subject – Copula – Predicate".@@@@1@20@@danf@17-8-2009 10840170@unknown@formal@none@1@S@Initially, this view was adopted even by the early comparative linguists such as [[Franz Bopp]].@@@@1@15@@danf@17-8-2009 10840180@unknown@formal@none@1@S@The central role of syntax within theoretical linguistics became clear only in the 20th century, which could reasonably be called the "century of syntactic theory" as far as linguistics is concerned.@@@@1@31@@danf@17-8-2009 10840190@unknown@formal@none@1@S@For a detailed and critical survey of the history of syntax in the last two centuries, see the monumental work by Graffi (2001).@@@@1@23@@danf@17-8-2009 10840200@unknown@formal@none@1@S@==Modern theories==@@@@1@2@@danf@17-8-2009 10840210@unknown@formal@none@1@S@There are a number of theoretical approaches to the discipline of syntax.@@@@1@12@@danf@17-8-2009 10840220@unknown@formal@none@1@S@Many linguists (e.g. [[Noam Chomsky]]) see syntax as a branch of biology, since they conceive of syntax as the study of linguistic knowledge as embodied in the human [[mind]].@@@@1@29@@danf@17-8-2009 10840240@unknown@formal@none@1@S@Others (e.g. [[Gerald Gazdar]]) take a more [[Philosophy of mathematics#Platonism|Platonistic]] view, since they regard syntax to be the study of an abstract [[formal system]].@@@@1@24@@danf@17-8-2009 10840260@unknown@formal@none@1@S@Yet others (e.g. [[Joseph Greenberg]]) consider grammar a taxonomical device to reach broad generalizations across languages.@@@@1@16@@danf@17-8-2009 10840280@unknown@formal@none@1@S@Some of the major approaches to the discipline are listed below.@@@@1@11@@danf@17-8-2009 10840290@unknown@formal@none@1@S@===Generative grammar===@@@@1@2@@danf@17-8-2009 10840300@unknown@formal@none@1@S@The hypothesis of [[generative grammar]] is that language is a structure of the human mind.@@@@1@15@@danf@17-8-2009 10840310@unknown@formal@none@1@S@The goal of generative grammar is to make a complete model of this inner language (known as ''[[i-language]]'').@@@@1@18@@danf@17-8-2009 10840320@unknown@formal@none@1@S@This model could be used to describe all human language and to predict the [[grammaticality]] of any given utterance (that is, to predict whether the utterance would sound correct to native speakers of the language).@@@@1@35@@danf@17-8-2009 10840330@unknown@formal@none@1@S@This approach to language was pioneered by [[Noam Chomsky]].@@@@1@9@@danf@17-8-2009 10840340@unknown@formal@none@1@S@Most generative theories (although not all of them) assume that syntax is based upon the constituent structure of sentences.@@@@1@19@@danf@17-8-2009 10840350@unknown@formal@none@1@S@Generative grammars are among the theories that focus primarily on the form of a sentence, rather than its communicative function.@@@@1@20@@danf@17-8-2009 10840360@unknown@formal@none@1@S@Among the many generative theories of linguistics are:@@@@1@8@@danf@17-8-2009 10840370@unknown@formal@none@1@S@*[[Transformational Grammar]] (TG) (now largely out of date)@@@@1@8@@danf@17-8-2009 10840380@unknown@formal@none@1@S@*[[Government and binding theory]] (GB) (common in the late 1970s and 1980s)@@@@1@12@@danf@17-8-2009 10840390@unknown@formal@none@1@S@*[[Linguistic minimalism|Minimalism]] (MP) (the most recent Chomskyan version of generative grammar)@@@@1@11@@danf@17-8-2009 10840400@unknown@formal@none@1@S@Other theories that find their origin in the generative paradigm are:@@@@1@11@@danf@17-8-2009 10840410@unknown@formal@none@1@S@*[[Generative semantics]] (now largely out of date)@@@@1@7@@danf@17-8-2009 10840420@unknown@formal@none@1@S@*[[Relational grammar]] (RG) (now largely out of date)@@@@1@8@@danf@17-8-2009 10840430@unknown@formal@none@1@S@*[[Arc Pair grammar]]@@@@1@3@@danf@17-8-2009 10840440@unknown@formal@none@1@S@*[[Generalised phrase structure grammar|Generalized phrase structure grammar]] (GPSG; now largely out of date)@@@@1@13@@danf@17-8-2009 10840450@unknown@formal@none@1@S@*[[Head-driven phrase structure grammar]] (HPSG)@@@@1@5@@danf@17-8-2009 10840460@unknown@formal@none@1@S@*[[Lexical-functional grammar]] (LFG)@@@@1@3@@danf@17-8-2009 10840470@unknown@formal@none@1@S@===Categorial grammar ===@@@@1@3@@danf@17-8-2009 10840480@unknown@formal@none@1@S@[[Categorial grammar]] is an approach that attributes the syntactic structure not to rules of grammar, but to the properties of the [[syntactic categories]] themselves.@@@@1@24@@danf@17-8-2009 10840490@unknown@formal@none@1@S@For example, rather than asserting that sentences are constructed by a rule that combines a noun phrase (NP) and a verb phrase (VP) (e.g. the [[phrase structure rule]] S → NP VP), in categorial grammar, such principles are embedded in the category of the [[head (linguistics)|head]] word itself.@@@@1@48@@danf@17-8-2009 10840500@unknown@formal@none@1@S@So the syntactic category for an [[intransitive]] verb is a complex formula representing the fact that the verb acts as a [[functor]] which requires an NP as an input and produces a sentence level structure as an output.@@@@1@38@@danf@17-8-2009 10840510@unknown@formal@none@1@S@This complex category is notated as (NP\\S) instead of V.@@@@1@10@@danf@17-8-2009 10840515@unknown@formal@none@1@S@NP\\S is read as " a category that searches to the left (indicated by \\) for a NP (the element on the left) and outputs a sentence (the element on the right)".@@@@1@32@@danf@17-8-2009 10840520@unknown@formal@none@1@S@The category of [[transitive verb]] is defined as an element that requires two NPs (its subject and its direct object) to form a sentence.@@@@1@24@@danf@17-8-2009 10840530@unknown@formal@none@1@S@This is notated as (NP/(NP\\S)) which means "a category that searches to the right (indicated by /) for an NP (the object), and generates a function (equivalent to the VP) which is (NP\\S), which in turn represents a function that searches to the left for an NP and produces a sentence).@@@@1@51@@danf@17-8-2009 10840540@unknown@formal@none@1@S@[[Tree-adjoining grammar]] is a categorial grammar that adds in partial [[tree structure]]s to the categories.@@@@1@15@@danf@17-8-2009 10840550@unknown@formal@none@1@S@===Dependency grammar===@@@@1@2@@danf@17-8-2009 10840560@unknown@formal@none@1@S@[[Dependency grammar]] is a different type of approach in which structure is determined by the [[relation]]s (such as [[grammatical relation]]s) between a word (a ''[[head (linguistics)|head]]'') and its dependents, rather than being based in constituent structure.@@@@1@36@@danf@17-8-2009 10840570@unknown@formal@none@1@S@For example, syntactic structure is described in terms of whether a particular [[noun]] is the [[subject]] or [[agent]] of the [[verb]], rather than describing the relations in terms of trees (one version of which is the [[parse tree]]) or other structural system.@@@@1@42@@danf@17-8-2009 10840580@unknown@formal@none@1@S@Some dependency-based theories of syntax:@@@@1@5@@danf@17-8-2009 10840590@unknown@formal@none@1@S@*[[Algebraic syntax]]@@@@1@2@@danf@17-8-2009 10840600@unknown@formal@none@1@S@*[[Word grammar]]@@@@1@2@@danf@17-8-2009 10840610@unknown@formal@none@1@S@*[[Operator Grammar]]@@@@1@2@@danf@17-8-2009 10840620@unknown@formal@none@1@S@===Stochastic/probabilistic grammars/network theories ===@@@@1@4@@danf@17-8-2009 10840630@unknown@formal@none@1@S@Theoretical approaches to syntax that are based upon [[probability theory]] are known as [[stochastic grammar]]s.@@@@1@15@@danf@17-8-2009 10840640@unknown@formal@none@1@S@One common implementation of such an approach makes use of a [[neural network]] or [[connectionism]].@@@@1@15@@danf@17-8-2009 10840650@unknown@formal@none@1@S@Some theories based within this approach are:@@@@1@7@@danf@17-8-2009 10840660@unknown@formal@none@1@S@*[[Optimality theory]]@@@@1@2@@danf@17-8-2009 10840670@unknown@formal@none@1@S@*[[Stochastic context-free grammar]]@@@@1@3@@danf@17-8-2009 10840680@unknown@formal@none@1@S@===Functionalist grammars===@@@@1@2@@danf@17-8-2009 10840690@unknown@formal@none@1@S@Functionalist theories, although focused upon form, are driven by explanation based upon the function of a sentence (i.e. its communicative function).@@@@1@21@@danf@17-8-2009 10840700@unknown@formal@none@1@S@Some typical functionalist theories include:@@@@1@5@@danf@17-8-2009 10840710@unknown@formal@none@1@S@*[[Functional grammar]] (Dik)@@@@1@3@@danf@17-8-2009 10840720@unknown@formal@none@1@S@*[[Prague Linguistic Circle]]@@@@1@3@@danf@17-8-2009 10840730@unknown@formal@none@1@S@*[[Systemic functional grammar]]@@@@1@3@@danf@17-8-2009 10840740@unknown@formal@none@1@S@*[[Cognitive grammar]]@@@@1@2@@danf@17-8-2009 10840750@unknown@formal@none@1@S@*[[Construction grammar]] (CxG)@@@@1@3@@danf@17-8-2009 10840760@unknown@formal@none@1@S@*[[Role and reference grammar]] (RRG)@@@@1@5@@danf@17-8-2009 10850010@unknown@formal@none@1@S@
SYSTRAN
@@@@1@1@@danf@17-8-2009 10850020@unknown@formal@none@1@S@'''SYSTRAN''', founded by Dr. [[Peter Toma]] in [[1968]], is one of the oldest [[machine translation]] companies.@@@@1@16@@danf@17-8-2009 10850030@unknown@formal@none@1@S@SYSTRAN has done extensive work for the [[United States Department of Defense]] and the [[European Commission]].@@@@1@16@@danf@17-8-2009 10850040@unknown@formal@none@1@S@SYSTRAN provides the technology for [[Yahoo!]] and [[AltaVista]]'s ([[Babel Fish (website)|Babel Fish]]) among others, but use of it was ended (circa 2007) for all of the language combinations offered by [[Google]]'s [[List of Google products#anchor_language_tools|language tools]].@@@@1@36@@danf@17-8-2009 10850050@unknown@formal@none@1@S@Commercial versions of SYSTRAN operate with operating systems [[Microsoft Windows]] (including [[Windows Mobile]]), [[Linux]] and [[Solaris (operating system)|Solaris]].@@@@1@18@@danf@17-8-2009 10850060@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10850070@unknown@formal@none@1@S@With its origin in the [[Georgetown-IBM experiment|Georgetown]] machine translation effort, SYSTRAN was one of the few machine translation systems to survive the major decrease of funding after the [[ALPAC|ALPAC Report]] of the mid-1960's.@@@@1@33@@danf@17-8-2009 10850080@unknown@formal@none@1@S@The company was established in [[La Jolla, San Diego, California|La Jolla]], [[California]] to work on translation of Russian to English text for the [[United States Air Force]] during the "[[Cold War]]".@@@@1@31@@danf@17-8-2009 10850090@unknown@formal@none@1@S@Large numbers of Russian scientific and technical documents were translated using SYSTRAN under the auspices of the USAF Foreign Technology Division (later the National Air and Space Intelligence Center) at [[Wright-Patterson Air Force Base]], Ohio.@@@@1@35@@danf@17-8-2009 10850100@unknown@formal@none@1@S@The quality of the translations, although only approximate, was usually adequate for understanding content.@@@@1@14@@danf@17-8-2009 10850110@unknown@formal@none@1@S@The company was sold during 1986 to the Gachot family, based in [[Paris]], [[France]], and is now traded publicly by the French stock exchange.@@@@1@24@@danf@17-8-2009 10850120@unknown@formal@none@1@S@It has a main office at the [[Grande Arche]] in [[La Defense]] and maintains a secondary office in [[La Jolla, San Diego, California]].@@@@1@23@@danf@17-8-2009 10850130@unknown@formal@none@1@S@== Languages ==@@@@1@3@@danf@17-8-2009 10850140@unknown@formal@none@1@S@Here is a list of the source and target languages SYSTRAN works with.@@@@1@13@@danf@17-8-2009 10850150@unknown@formal@none@1@S@Many of the pairs are to or from English or French.@@@@1@11@@danf@17-8-2009 10850160@unknown@formal@none@1@S@* Russian into English (1968)@@@@1@5@@danf@17-8-2009 10850170@unknown@formal@none@1@S@* English into Russian (1973) for the [[Apollo-Soyuz]] project@@@@1@9@@danf@17-8-2009 10850180@unknown@formal@none@1@S@* English source (1975) for the [[European Commission]]@@@@1@8@@danf@17-8-2009 10850190@unknown@formal@none@1@S@* Arabic@@@@1@2@@danf@17-8-2009 10850200@unknown@formal@none@1@S@* Chinese@@@@1@2@@danf@17-8-2009 10850210@unknown@formal@none@1@S@* Danish@@@@1@2@@danf@17-8-2009 10850220@unknown@formal@none@1@S@* Dutch@@@@1@2@@danf@17-8-2009 10850230@unknown@formal@none@1@S@* French@@@@1@2@@danf@17-8-2009 10850240@unknown@formal@none@1@S@* German@@@@1@2@@danf@17-8-2009 10850250@unknown@formal@none@1@S@* Greek@@@@1@2@@danf@17-8-2009 10850260@unknown@formal@none@1@S@* Hindi@@@@1@2@@danf@17-8-2009 10850270@unknown@formal@none@1@S@* Italian@@@@1@2@@danf@17-8-2009 10850280@unknown@formal@none@1@S@* Japanese@@@@1@2@@danf@17-8-2009 10850290@unknown@formal@none@1@S@* Korean@@@@1@2@@danf@17-8-2009 10850300@unknown@formal@none@1@S@* Norwegian@@@@1@2@@danf@17-8-2009 10850310@unknown@formal@none@1@S@* Serbo-Croatian@@@@1@2@@danf@17-8-2009 10850320@unknown@formal@none@1@S@* Spanish@@@@1@2@@danf@17-8-2009 10850330@unknown@formal@none@1@S@* Swedish@@@@1@2@@danf@17-8-2009 10850340@unknown@formal@none@1@S@* Persian@@@@1@2@@danf@17-8-2009 10850350@unknown@formal@none@1@S@* Polish@@@@1@2@@danf@17-8-2009 10850360@unknown@formal@none@1@S@* Portuguese@@@@1@2@@danf@17-8-2009 10850370@unknown@formal@none@1@S@* Ukrainian@@@@1@2@@danf@17-8-2009 10850380@unknown@formal@none@1@S@* Urdu@@@@1@2@@danf@17-8-2009 10860010@unknown@formal@none@1@S@
Text analytics
@@@@1@2@@danf@17-8-2009 10860020@unknown@formal@none@1@S@The term '''text analytics''' describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques.@@@@1@18@@danf@17-8-2009 10860030@unknown@formal@none@1@S@The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems.@@@@1@26@@danf@17-8-2009 10860040@unknown@formal@none@1@S@These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.@@@@1@26@@danf@17-8-2009 10860050@unknown@formal@none@1@S@A typical application is to scan a set of documents written in a [[natural language]] and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.@@@@1@36@@danf@17-8-2009 10860060@unknown@formal@none@1@S@Current approaches to text analytics use [[natural language processing]] techniques that focus on specialized domains.@@@@1@15@@danf@17-8-2009 10860070@unknown@formal@none@1@S@Typical subtasks are:@@@@1@3@@danf@17-8-2009 10860080@unknown@formal@none@1@S@* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.@@@@1@22@@danf@17-8-2009 10860090@unknown@formal@none@1@S@* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object.@@@@1@13@@danf@17-8-2009 10860100@unknown@formal@none@1@S@For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference.@@@@1@9@@danf@17-8-2009 10860110@unknown@formal@none@1@S@* [[Relationship Extraction]]: extraction of named relationships between entities in text@@@@1@11@@danf@17-8-2009