


The developed SMT is based on the Moses SMT system.

This paper details the design and implementation of a system that automatically builds text corpora for Filipino and English languages, intended for use as training, development and test sets of a Filipino-English Statistical MachineTranslation (SMT) system. An initial result and contribution of the study to this field of research is a table of the most prevalent features of Philippine English in Filipino words using Bautista & Gonzalez (1992) and Borlongan (2007) vis-a-vis the most popular word-formation processes in Filipino-Philippine English based on Zorc (1996), Abello (2002), Reyes-Otero (2002), and KWF (2004). The paper identifies the features exhibited by and the word-formation processes involved in the new words from Sawikaan's Words of the Year winners that were entered in the University of the Philippines' Diksiyunaryong Filipino (2010) and are at the same time standard words in the Oxford Dictionary of World English Online (2010). The exploratory-descriptive study tries to show that the two languages of the Philippine bilingual educational system-two languages-in-progress-are heading to a convergence as it proves empirically that Filipino is no different from and is contributing to the expansion, development, and evolution of Philippine English as both are codified in the standard lexicons of the two languages. The techniques used in this study are important in language education, serving to identify areas of confusion in language use in aspects of grammar and orthography. Alternative forms of usage for each selected language rule were identified, and frequency counts were made, to be used as bases for a comparative analysis between the rules prescribed by standard reference books and actual language usage. A list of language rules on grammar and orthography were selected from standard reference books for each of the aforementioned languages. This study makes an objective analysis of the levels of agreement, in terms of grammar and orthographic rules, between reference books and actual usage as evidenced from web-mined text corpora for three major Philippine languages, namely Filipino, Cebuano-Visayan and Ilokano. While there are such rules for some Philippine languages, there is a need to determine the agreement and points of departure between the rules and the usage to avoid confusion.

The implementation of Mother Tongue-Based Multilingual Education (MTBMLE) will require definitive rules for orthography and grammar.
