I am currently doing MS, and for my final research I wanted to develop the parallel corpus. I have translation of source and target language. What else I have to do in order to develop the parallel corpus? Should I have to tokenize this data? or any other processing on this text? --
Kind Regards,
Mr. Asad Abdul Malik Research Student Institute of Information Technology, Kohat University of Science and Technology, Kohat-26000, K.P.K, Pakistan