It can be hard to understand the slang used when people communicate via social networks, just imagine how difficult it is to translate certain words. The research team of the company “Microsoft” has been focusing its efforts to study how social media slang should be translated. The team managed to make a lot of progress and now, the upcoming Skype translator will be able to talk more like a real human. A lot of researchers share the opinion that social media could be an essential ingredient that will help the machines understand human language.
Stanford’s expert in computational linguistics Dan Jurafsky stated that social media experiments are examples of a new line of research in computational social science. He also said that this shows how social meaning can be extracted from speech and text, while performing a complex natural task.
We can expect the beta version of Skype’s translation app later this year, as it was already demonstrated in May during the Code Conference. This app is able to provide users with multi lingual services in a synchronic conversation, and the features were demoed by Gurdeep Singh Pall, corporate vice president of Skype and Lync at Microsoft. While Pall was speaking in English, German, as well as English translation were displayed at the bottom of the screen, accompanied by real time audio translation.
Speech and text
The software system incorporates several different technologies, among which are speech recognition, speech synthesis and machine translation. According to Vikram Dendi, who is a technical and strategy advisor at Microsoft Research, the previous attempts to combine these technologies had little success, since there is a huge difference between our spoken language and our written language. For example, while we speak, we use pauses in the forms of “ums” and “ahs”, not to mention how our spoken language is influenced by prosodic features used for the purpose of expressing different meanings, like sarcasm, or how we use prosody to issue orders or ask questions (You’re picking up the kids!/ You’re picking up the kids?) etc. It is somehow very hard to imagine a machine that could accurately translate what was implied in a spoken sentence.
English to German Translation
Before starting the research with social media, Microsoft’s translation system used all the data from published books that were translated. The data from translated books was inserted in machine learning pipeline, also called phrasal statistical machine translation (phrasal SMT). The system works by segmenting the sentences into smaller phrases called n-gram (n- is the number of phrases).
When translating a particular sentence from English to German, the n-gram mapped in English finds the corresponding n-gram in German, so in a way, this whole process teaches a computer how to find corresponding phrases. Whenever a software encounters a new untranslated phrase, he calculates the most probable translation from his n-gram data base, which would correspond as the solution to the translation problem.
The program already has the ability to translate common phrases in several different languages, even if the order of words is not completely the same as the original. However, when the system encounters an uncommon phrase with a different word order, it tends to respond with confusion.
The reason for this confusion arises from the fact that SMT doesn’t possess the ability to comprehend grammar. So, whenever there is a need to shift between different grammar rules, the system may not respond with full accuracy. Take the difference between English and Japanese for example. In English, the grammatical order is subject, verb, object whereas in Japanese is subject, object, verb. The solution for this hindrance can be found in a new program Microsoft is developing - syntactic SMT (syntactically informed phrasal statistical machine translation).
Basically, it is software that can comprehend syntaxes from different languages, meaning that instead of word for word translation, the software will find the appropriate place for it in a sentence while translating. So far, this is the best approach to tackle this problem, but there is still room for software improvement. Still, it is an undeniable leap in this area, and by studying the slang used in social media and incorporating it into the program, this software will surely grow into something impressive once the final version is released.