The idea of our project has a huge reach, say students about the transfer of sign language into text

In June, Johns Hopkins University will host the landmark Annual Frederick Jelinek Memorial Summer Workshop, which will include representatives from the Faculty of Applied Sciences (FAS). For six weeks, they will focus on speech and language technologies.

From 24 June to 2 August 2024, teachers and students from the Faculty of Applied Sciences (FAS) will be part of an international team led by Marek Hrúz from the Department of Cybernetics. He will be accompanied by graduate and doctoral students to the USA. While the workshop is scientifically oriented, it also gives young people interested in the field a chance to meet equally or similarly oriented colleagues. "In addition to the lecturers and students of the FAS, our team will also include students from the Faculty of Mathematics and Physics of the UK, the USA, Turkey, and Ghana. In total, approximately fifteen to twenty people," says Ondřej Valach, a student in the Master's degree program.

The FAS team's journey at Johns Hopkins University was kick-started by Mark Hruz's successful project on sign language recognition. The system he designed solves the task of video-to-text recognition, where the text corresponds to a transcription of spoken language. The whole task is, therefore, conceived as a translation. "Based on this proposal, I was invited. I incorporated the comments of the expert committee into the original proposal and developed the final version of the project," adds Marek Hrúz.

What is innovative about the whole project is the incorporation of large language models, or LLMs, which have received a lot of attention in the last few years. "These models could be connected to image recognition and take advantage of the fact that the model already has a general knowledge of how language works," says another student, Václav Javorek, Ondřej's classmate. "LLMs have a lot of potential, but they are also quite expensive to train," adds Václav.

"The idea of our project has a huge reach. If everything works out, it could kick-start further research in the field of sign language in the future. There are also plans for translation in the opposite direction, i.e. sign language synthesis. A hearing person who does not know sign language could thus easily communicate with a deaf person because the avatar would sign the spoken word, for example, via a tablet. Or it would be possible to watch any video on, say, YouTube, which the avatar would translate into sign language without any problems, and the person would not have to worry about whether the video could be subtitled. But the biggest problem is the naturalness of the avatar's movement. If the signing is jerky, deaf people find it distracting to watch and prefer subtitles," the FAS students said in unison.