Marek Hrúz from FAS will lead the international team at Johns Hopkins University

Experts from the Czech Republic, Turkey, USA, and Ghana will address the problem of sign language recognition and its transcription into text.

The prestigious Johns Hopkins University in Baltimore will hold a workshop from 24 June to 2 August on speech and language technologies, with Marek Hrúz from the Department of Cybernetics of the Faculty of Applied Sciences (FAS) and the NTIS Research Centre as a speaker. The workshop will be preceded by a two-week summer school. During summer school, the Pilsen-based researcher from ZČU is going to lecture American students about the technologies he wants to use to solve his proposed project - sign language recognition and its transcription into text.

The projects and topics addressed in the workshops go through a demanding approval process. The original one-page proposal by Marek Hrúz described a video-to-text sign language recognition task, where the text corresponds to a transcription of spoken language. Thus, the whole task is conceived as a translation. "In the project proposal, I focused on the analysis of detected pose, which we had already addressed with Matyáš Boháček, who is now studying at Stanford. My motivation to write this project was to enable translation on mobile devices. It brings faster processing, less data, etc.," says Marek Hrúz.

The submitted concept was well received, and Marek Hrúz was invited to a personal meeting at Johns Hopkins University in November 2023, which Matyáš Boháček also attended. Also invited was Murat Saraçlar, an expert from Bogazici University Istanbul, with extensive experience in speech and sign language recognition. Based on his comments, the original proposal was amended and presented, along with four other proposals, before a plenary panel of experts evaluated the proposals, discussed them with the project leaders, and further modified them. "Our project was enriched with the idea of using large language models (LLM). This recommendation was proposed by Florian Metze (Meta, Carnegie Mellon University), based on which I developed the final version of the project," adds Marek Hrúz.

The committee awarded the project of the FAS researcher first place. "Not a single member of the committee commented negatively on the support. This rare phenomenon indicates a great interest in sign language processing in the AI community. This is a truly great success for our department," adds Marek Hrúz. In the case of the FAS team, it is the aforementioned LLM, speech models, sign language recognition and synthesis, and image processing.

What is the advantage of using LLM? These models already know the world, specifically about sign language. "ChatGPT can, for example, describe how to perform individual signs - hand shape, movements, etc. LLMs understand the 'logic of the world' so they can hallucinate the knowledge. Infer facts based on logic, even if they are not directly observed in the data," explains Marek Hrůz.

The project's budget is currently being sorted out, with Johns Hopkins University seeking sponsors from big-name companies such as Amazon, Meta, Google, and Microsoft. At the same time, the team's composition is being finalized, consisting of senior and junior researchers, PhD students, and even two graduate students from FAS. However, the team will also include Czech Republic, Turkey, the USA and Ghana experts. The Department of Cybernetics at the FAS is preparing the technology to successfully implement the project during the summer. "We are discussing this in the department as a broader team and it shows a big range of expertise in our AI department," says Mark Hruz, who will lead the international team.

In the future, the AI experts see potential in teaming up with colleagues from the robotics and automation department to program a humanoid robot to communicate in sign language. The workshop aims to connect people internationally so that new excellent projects can be developed.

The 2024 Tenth Jelinek Summer Workshop on Speech and Language Technology is held in honour of Bedřich (Frederick) Jelinek, a Czechoslovakian-born scientist who worked in computer science and speech analysis. Until he died in 2010, he was a professor at JHU and director of the Center for Language Processing there. In 2014, the workshop was held in Prague at the Faculty of Mathematics and Physics of Charles University.


Marek Hrúz speaks at Technica Futura Conference.

Matyáš Boháček.

Martina Batková

25. 04. 2024