free papers,research papers,free term paper samples

Analysis of the mainstream of Chinese information processing technology _ Paper Network

Paper Network: Paper Keywords: Information processing N-model speech recognition parsing
Abstract: This paper analyzes the mainstream of Chinese Information processing technology, especially in several important parts, namely N-model, speech recognition and parsing technology.


First, the characteristics of Chinese Information processing
(A special Chinese characters
We all know that English in the Computer Information processing advantages is its limited number of letters, which can easily be input and output as well as Information processing and handling, while the number of Chinese characters are large and relatively complex shape, which gives the characters coding has brought no small trouble, so we Chinese information processing according to the different requirements in different forms of Chinese characters for the encoding, in conclusion, there are several options that Chinese character input code, Chinese character coding standard, characters in codes and character image code.

(B particularity of written Chinese
Another feature of Chinese language in written expression, the words and there is no obvious separation between tokens mark, which makes automatic analysis of words in the written Chinese language to set up a problem. Segmentation need to be contiguous words according to certain specifications and orderly combination, we will find more English, the English do with a space between words are delimited, and is used by the Chinese word, sentence and paragraph for a simple division, and this difficulty is one of the division of words , we all know, there are phrases in English into the problem, but because Chinese words than English to a large number and range, and therefore more difficult to deal with them.

(C particularity of Chinese Speech
In the speech, the Chinese characteristics is a relatively simple syllable structure, syllable demarcation is relatively clear, but the tone and modulation are Chinese and English a significant difference, and therefore in speech recognition and speech synthesis in terms of which is a disadvantage, but on the whole Chinese speech processing than the other, but still relatively easy.

(D particularity of Chinese Grammar
In grammar, syntactic function of Chinese words is relatively difficult to determine, this is the ever-changing English language has a different form of expression. Chinese rely mainly on word order and function word to express different meanings, so if you can not be a good grasp of syntax , is particularly prone to ambiguity, so Chinese sentences automatically analyze this important technology is a difficult to overcome technology.

Second, a number of Chinese information processing technology
(An N-model
Let wi is a word in any text, if it is known in the text of the first two words wi-2w-1, we can use the conditional probability P (wi | wi-2w-1) to predict the probability of wi This is the concept of statistical language models In general, if the variable W is an arbitrary word in the text sequence, which consists of the n-word order, namely, W = w1w2 ... wn, the statistical language model is word sequence W in the text probability P (W). the use of the product of the probability formula, P (W) can be expanded as: P (W) = P (w1) P (w2 | w1) P (w3 | w1 w2 ). . P (wn | w1 w2 ... wn-1) is easy to see, in order to predict the probability wn words, it must know all the words in front of the emergence of probability from the calculation point of view, this method is too complicated, if any the probability of the emergence of a word wi with only two words on its front, the problem can be greatly simplified when the language model is called the three yuan model (tri-gram): P (W) ≈ P (w1) P (w2 | w1) Πi (i = 3 ,..., nP (wi | wi-2w-1) Links to free download http://www.hi138.com symbol Πii = 3, ..., n P (...) that the probability of even by. In general, N meta-model is the assumption that the probability of the emergence of the current word in front of it with only the N-1 words the important thing is that these probabilities are the parameters can be large-scale corpus calculated, such as three yuan probability P (wi | wi-2wi-1) ≈ count (wi-2wi-1wi) / count (wi-2wi-1) where count (...) represents a particular sequence of words in the entire The cumulative number of times in the corpus appear.

(II speech recognition
The ultimate goal of speech recognition between humans and Computers is to make real sense of free exchange, so that the machine understand human language, and make timely and accurate feedback. Speech recognition technology, including signal processing, pattern recognition, probability theory and information theory, theory and aural sound machine principle, artificial intelligence and other major content technologies include speech recognition feature extraction, pattern matching criteria, and three aspects of model training techniques, in addition to the speech recognition unit involved in the selection, in this issue We usually based on the use of syllables as recognition units. In addition, the characteristic parameter extraction techniques, because the voice symbol contains a wealth of information, they are often referred to as acoustic features. is to determine the characteristic parameters of the quality of key speech recognition technology, Therefore, we should most likely capture the semantics of the language you want to spread the information to weed out interference with the speaker's personal information, so as to ensure the validity and accuracy of feature parameters.

(C parsing
Syntactic analysis is based on the grammatical features of Chinese analysis of the sentence, paragraph, phrase structure tree for each sentence element analysis of the relationship, analysis of the main contents include: a single sentence for all sentences, each a single sentence in the role of syntax in What, in a single sentence greater than what is grammatical structure, sentence type in a phrase or phrases is, in the role it plays in the sentence, and finally, how all these components are combined or attached to an organic whole sentence, which is The main contents of the syntactic structure analysis, this is called line graph analysis. Worthy of note is the subject of English language structure must be placed before the predicate, otherwise the meaning of the expression is completely changed, of course, in certain circumstances, such as the inverted This sentence structure installed or widespread. This is with the Chinese have a significant difference.

Third, the conclusion
Chinese information processing technology is significant, it is a linguistic and organic integration of information technology to the Chinese sound, form, meaning, etc. into the Computer, and then make the necessary information processing and processing, in the process related to the Computer science, information science, acoustics, a large number of cross-disciplinary knowledge. Specifically, the language information processing is all part of natural language, including words, sentences, paragraphs and even chapters for text, voice and image processing of information in various ways , then the information input and output, compression, storage and retrieval, and so the deal we all know that natural language is the most important in our daily communication tool is the human activity of thinking, an effective vehicle for cultural transmission, so the language information processing technology has important significance of this, the paper devoted to an analysis using Computer processing Chinese information, the Chinese information processing technology, I hope this can have implications for colleagues also hope to learn to interact more, and better improve the technology .


References:
[1] Cao Bangwei, high pass good Computer and information processing [M]. Shanghai: Fudan University Press, 2001.

[2] Chen Xiao. Chinese Information Processing Overview [J]. Nanjing Normal University, 2002, (1).

[3] FENG Zhiwei. Chinese characters and Chinese computer processing [J]. Contemporary Linguistics, 2001, (1) Links to free download http://www.hi138.com

Newest Research Papers

  • Newest
  • Computer Applications Papers

MOST POPULAR Computer Applications Papers

  • 24Hours
  • 7Days
  • 30Days