Anthony Hartley
Rikkyo University
Beibei He
Rikkyo University
Masao Utiyama
National Institute of Information and Communications Technology
Hitoshi Isahara
Toyohashi University of Technology
Eiichiro Sumita
National Institute of Information and Communications Technology
Vol. 4 No. 2 (2018), Articles, pages 141-161
Accepted: Jun 19, 2019
This paper describes challenges facing the ScrumSourcing project to create a neural machine translation (NMT) service aiding interaction between Japanese- and English-speaking fans during Rugby World Cup 2019 in Japan. This is an example of «domain adaptation». The best training data for adapting NMT is large volumes of translated sentences typical of the domain. In reality, however, such parallel data for rugby does not exist. The problem is compounded by a marked asymmetry between the two languages in conventions for post-match reports; and the almost total absence of in-match commentaries in Japanese. In post-editing the NMT output to incrementally improve quality via retraining, volunteer rugby fans will play a crucial role in determining a new genre in Japanese. To avoid de-motivating the volunteers at the outset we undertake an initial adaptation of the system using terminological data. This paper describes the compilation of this data and its effects on the quality of the systems’ output.


