In Pavlick et al we demonstrate an algorithm that is able to automatically adapt paraphrases to suit a particular domain. Here's a sample of what I have produced: A large scale evaluation of machine translation output by crowd workers with a comprehensive comparison against expert annotators and recommendations for quality control Callison-Burch et al We automatically assign semantic entailment relations to all million entries in PPDB using features derived from past work on discovering inference rules from text and semantic taxonomy induction.PPDB is freely available from our web site paraphrase. It allows us to recognize that the laptop's screen can be rewritten as the screen of the laptop. Separating good paraphrases from bad presents fascinating research challenges Pavlick et al Figure 3: We partition paraphrases of an input word like bug into clusters representing its distinct senses. However, this language independence does not mean that statistical machine translation works equally well for every language. These data sets are valuable for evaluating syntactic models of translations, since Indian languages are verb final and require a lot of long-distance reordering Post et al Thus, thrown into jail not only paraphrases as imprisoned, but also as arrested, detained, incarcerated, jailed, locked up, taken into custody, and thrown into prison. Vast quantities of bilingual training data allow us to extract a huge number of phrase pairs and to estimate associated probabilities. We produce high quality sense clusters that represent a substantial improvement to PPDB. This builds on my research group's work into adding syntactic information into statistical machine translation rules.
Vast quantities of bilingual training data allow us to extract a huge number of phrase pairs and to estimate associated probabilities. In a parliamentary domain it more commonly refers to the divide between rich and poor, and should be paraphrased as gap, division, gulf, separate, distinction, rift, difference.
If successful, these efforts will radically change the field and make statistical machine translation applicable to nearly all of the world's languages. We build models of the annotators themselves, and use those models to create high quality labeled training data by soliciting redundant labels and making predictions about which labels and which annotators are most likely to be correct.
Thus a search for paraphrases of the noun bug would yield a single list of paraphrases that includes insect, glitch, beetle, error, microbe, wire, cockroach, malfunction, microphone, mosquito, virus, tracker, pest, informer, snitch, parasite, bacterium, fault, mistake, failure and many others.
It shows that through judicious application of quality control techniques, crowdsourced translations can fall in the range that we would expect of professional translators. Table 1 shows a variety of other meaning-preserving structural transformations that we learn in this way Ganitkevitch et al We have adopted a synchronous context free grammar SCFG representation for our Joshua decoder, and we demonstrated that it is useful for translating between languages with different word orders like Urdu's subject-object-verb order and English's subject-verb-object order Baker et al However, not all the paraphrases are uniformly good.
A demographic study of the languages spoken by workers on the Mechanical Turk crowdsourcing platform, which resulted in bilingual dictionaries for languages with 10, words translated in each of the languages Pavlick et al