Analysis of the mainstream of Chinese information processing technology _ Paper Network
Paper Network: Paper Keywords: Information processing N-model speech recognition parsing
Abstract: This paper analyzes the mainstream of Chinese Information processing technology, especially in several important parts, namely N-model, speech recognition and parsing technology.
First, the characteristics of Chinese Information processing
(A special Chinese characters
We all know that English in the Computer Information processing advantages is its limited number of letters, which can easily be input and output as well as Information processing and handling, while the number of Chinese characters are large and relatively complex shape, which gives the characters coding has brought no small trouble, so we Chinese information processing according to the different requirements in different forms of Chinese characters for the encoding, in conclusion, there are several options that Chinese character input code, Chinese character coding standard, characters in codes and character image code.
(B particularity of written Chinese
Another feature of Chinese language in written expression, the words and there is no obvious separation between tokens mark, which makes automatic analysis of words in the written Chinese language to set up a problem. Segmentation need to be contiguous words according to certain specifications and orderly combination, we will find more English, the English do with a space between words are delimited, and is used by the Chinese word, sentence and paragraph for a simple division, and this difficulty is one of the division of words , we all know, there are phrases in English into the problem, but because Chinese words than English to a large number and range, and therefore more difficult to deal with them.
(C particularity of Chinese Speech
In the speech, the Chinese characteristics is a relatively simple syllable structure, syllable demarcation is relatively clear, but the tone and modulation are Chinese and English a significant difference, and therefore in speech recognition and speech synthesis in terms of which is a disadvantage, but on the whole Chinese speech processing than the other, but still relatively easy.
(D particularity of Chinese Grammar
In grammar, syntactic function of Chinese words is relatively difficult to determine, this is the ever-changing English language has a different form of expression. Chinese rely mainly on word order and function word to express different meanings, so if you can not be a good grasp of syntax , is particularly prone to ambiguity, so Chinese sentences automatically analyze this important technology is a difficult to overcome technology.
Second, a number of Chinese information processing technology
(An N-model
Let wi is a word in any text, if it is known in the text of the first two words wi-2w-1, we can use the conditional probability P (wi | wi-2w-1) to predict the probability of wi This is the concept of statistical language models In general, if the variable W is an arbitrary word in the text sequence, which consists of the n-word order, namely, W = w1w2 ... wn, the statistical language model is word sequence W in the text probability P (W). the use of the product of the probability formula, P (W) can be expanded as: P (W) = P (w1) P (w2 | w1) P (w3 | w1 w2 ). . P (wn | w1 w2 ... wn-1) is easy to see, in order to predict the probability wn words, it must know all the words in front of the emergence of probability from the calculation point of view, this method is too complicated, if any the probability of the emergence of a word wi with only two words on its front, the problem can be greatly simplified when the language model is called the three yuan model (tri-gram): P (W) ≈ P (w1) P (w2 | w1) Πi (i = 3 ,..., nP (wi | wi-2w-1) Links to free download http://www.hi138.com symbol Πii = 3, ..., n P (...) that the probability of even by. In general, N meta-model is the assumption that the probability of the emergence of the current word in front of it with only the N-1 words the important thing is that these probabilities are the parameters can be large-scale corpus calculated, such as three yuan probability P (wi | wi-2wi-1) ≈ count (wi-2wi-1wi) / count (wi-2wi-1) where count (...) represents a particular sequence of words in the entire The cumulative number of times in the corpus appear.
(II speech recognition
The ultimate goal of speech recognition between humans and Computers is to make real sense of free exchange, so that the machine understand human language, and make timely and accurate feedback. Speech recognition technology, including signal processing, pattern recognition, probability theory and information theory, theory and aural sound machine principle, artificial intelligence and other major content technologies include speech recognition feature extraction, pattern matching criteria, and three aspects of model training techniques, in addition to the speech recognition unit involved in the selection, in this issue We usually based on the use of syllables as recognition units. In addition, the characteristic parameter extraction techniques, because the voice symbol contains a wealth of information, they are often referred to as acoustic features. is to determine the characteristic parameters of the quality of key speech recognition technology, Therefore, we should most likely capture the semantics of the language you want to spread the information to weed out interference with the speaker's personal information, so as to ensure the validity and accuracy of feature parameters.
(C parsing
Syntactic analysis is based on the grammatical features of Chinese analysis of the sentence, paragraph, phrase structure tree for each sentence element analysis of the relationship, analysis of the main contents include: a single sentence for all sentences, each a single sentence in the role of syntax in What, in a single sentence greater than what is grammatical structure, sentence type in a phrase or phrases is, in the role it plays in the sentence, and finally, how all these components are combined or attached to an organic whole sentence, which is The main contents of the syntactic structure analysis, this is called line graph analysis. Worthy of note is the subject of English language structure must be placed before the predicate, otherwise the meaning of the expression is completely changed, of course, in certain circumstances, such as the inverted This sentence structure installed or widespread. This is with the Chinese have a significant difference.
Third, the conclusion
Chinese information processing technology is significant, it is a linguistic and organic integration of information technology to the Chinese sound, form, meaning, etc. into the Computer, and then make the necessary information processing and processing, in the process related to the Computer science, information science, acoustics, a large number of cross-disciplinary knowledge. Specifically, the language information processing is all part of natural language, including words, sentences, paragraphs and even chapters for text, voice and image processing of information in various ways , then the information input and output, compression, storage and retrieval, and so the deal we all know that natural language is the most important in our daily communication tool is the human activity of thinking, an effective vehicle for cultural transmission, so the language information processing technology has important significance of this, the paper devoted to an analysis using Computer processing Chinese information, the Chinese information processing technology, I hope this can have implications for colleagues also hope to learn to interact more, and better improve the technology .
References:
[1] Cao Bangwei, high pass good Computer and information processing [M]. Shanghai: Fudan University Press, 2001.
[2] Chen Xiao. Chinese Information Processing Overview [J]. Nanjing Normal University, 2002, (1).
[3] FENG Zhiwei. Chinese characters and Chinese computer processing [J]. Contemporary Linguistics, 2001, (1) Links to free download http://www.hi138.com
Newest Research Papers
- Newest
- Computer Applications Papers
- The rise of the Internet era to create a large network of integrated marketing value
- Chinese students in English language writing negative transfer network to write papers analyzing _ _ net _ to write thesis papers Network
- Chinese students' English pronunciation problems On
- On the "Wuthering Heights"
- On building a culture of three sources of English and American Literature Literature Teaching Corpus improve
- Anglo-American literature on the characteristics of the strange language
- American Literature on the College English curriculum
- On the teaching of English and American Literature on film and literature interaction
- On the Anglo-American literature class on the social and cultural background knowledge in the import
- On the Anglo-American literature in the vague language of the translation strategies
- Anglo-American literature on the reform of teaching in the multimedia
- On the Multimedia in the Teaching of English and American Literature
- Carried out on university English classroom teaching of English and American Literature and challenges the status quo
- Analysis of critical discourse on the Teaching of English and American Literature courses
- On teaching English and American Literature in English in an important position papers to write network _
MOST POPULAR Computer Applications Papers
- 24Hours
- 7Days
- 30Days
- How to write a research paper?
- About bracket theory in vocational English Listening Teaching
- To explore the Chinese language and literature courses to build network to write papers _
- On the Multimedia in the Teaching of English and American Literature
- On building a culture of three sources of English and American Literature Literature Teaching Corpus
- Interview must be conscientious about
- United States International Development Strategy Analysis of Higher Education
- About Metropolis news magazine of the operation planning
- On the "Wuthering Heights"
- On the new media era newspaper editor's role
- Carried out on university English classroom teaching of English and American Literature and challeng
- Stressors on ICU nurses and Countermeasures
- Students on full play the main role in the teaching of English
- About Vocational School of Health to develop education and training
- Amy Tan novel about mother-daughter relationship between culture _ paper to write network
- About bracket theory in vocational English Listening Teaching
- Hangzhou guide the work on the practice patterns of family education
- On the new curriculum of high school language teaching
- On Quju "lone elm house"
- On how sports psychology in the formation of child health
- Treatment of cervical scraping rubbing on back muscle strain of the clinical experience
- On the secondary school mathematics teaching poor students into thinking about the problem
- Stressors on ICU nurses and Countermeasures
- Students on full play the main role in the teaching of English
- About Vocational School of Health to develop education and training
- How mathematics teaching in primary schools to implement quality education
- Psychological Contract Perspective counselor burnout causes and Countermeasures
- Amy Tan novel about mother-daughter relationship between culture _ paper to write network
- About bracket theory in vocational English Listening Teaching
- On patients in rural junior high school chemistry experiment on the use of resources
- On the water project's construction cost control measures On the _ papers to write network