Munkhtsetseg Namsrai, Institute of Language and Literature, Mongolian Academy of Sciences, Ulaanbaatar, Mongolia
This study aims to examine the terminology management tool ProTerm for its appropriateness for our need to manage Mongolian. Although there are plenty of terminology management tools, the most of them do not support Asian languages including Mongolian whereas they only support developed languages such as English, German, French, Spanish and some other languages. Since manual collection of terms is the most laborious, costly, and time-consuming work, we need appropriate terminology management tool to facilitate our heavy loading work. In this regard, I have tested the terminology management tool ProTerm whether it can help us to manage terms in Mongolian. The results of the experiment show that the Proterm can be a timesaving and efficient tool for our terminology work since it can extract more acceptable terms than I expected and allow us to create a termbase with our desired terminology entries.
termbank, terminology management system, terminology extraction, Mongolian term, terminology resources, noise, silence
Mohamed Amine Menacer and Kamel Smaïli, Université de Lorraine, CNRS, LORIA, F-54000 Nancy, France
The Arabic language has many varieties, including its standard form, Modern Standard Arabic (MSA), and its spoken forms, namely the dialects. Those dialects are representative examples of under-resourced languages for which automatic speech recognition is considered as an unresolved issue. To address this issue, we recorded several hours of spoken Algerian dialect and used them to train a baseline model. This model was boosted afterwards by taking advantage of other languages that impact this dialect by integrating their data in one large corpus and by investigating three approaches: multilingual training, multitask learning and transfer learning. The best performance was achieved using a limited and balanced amount of acoustic data from each additional language, as compared to the data size of the studied dialect. This approach led to an improvement of 3.8% in terms of word error rate in comparison to the baseline system trained only on the dialect data.
Automatic speech recognition, Algerian dialect, MSA, multilingual training, multitask learning, transfer learning
Zhihao Zheng, Yao Zhang, Vinay Gurram, Jose Salazar Useche, Isabella Roth, Yi Hu, Department of Computer Science, Northern Kentucky University, Highland Heights, Kentucky USA 41099
At present, the development and innovation in any business/engineering field are inseparable from the computer and network infrastructure that supports the core business. The world has been turning into an era of rapid development of information technology. Every year, there are more individuals and companies that start using cloud storages and other cloud services for computing and information storage. Therefore, the security of sensitive information in cloud becomes a very important challenge that needs to be addressed. The cloud authentication is a special form of authentication for today’s enterprise IT infrastructure. Cloud applications communicate with the LDAP server which could be an on-premises directory server or an identity management service running on cloud. Due to the complex nature of cloud authentication, an effective and fast authentication scheme is required for successful cloud applications. In this study, we designed several cloud authorization schemes to integrate an on-premises or cloud-based directory service with a cloud application. We also discussed the pros and cons of different approaches to illustrate the best practices on this topic.
Cloud Application Authentication, Identity Management in Cloud, IAM
Olu Amusan, Dominic Carrillo, Luke Hillard, Department of Computer Science and Engineering University of North Texas
Current research in the automotive industry has been striving to new heights in object detection with adding real-time object detection. The accuracy of majority classification models are not adequate when detecting vehicles in multiple different scenarios especially in real-time. In more direct terms, what we aim to improve is the loss of objects between frames. Within our model we propose to train our model using diverse images of vehicles in different scenarios. We will be using a novel approach of creating composite frames in the training data by overlaying the previous one and two frames over the original frame. Experimental results will demonstrate how our classification models improve in detection with this novel training data approach. The impact we hope to see is that we can make safer autonomous driving by changing the training data.
autonomous vehicles, yolov4, real-time detection, object detection
Kaustuv Kunal, Littilabs.com, India
Serverless architectures are cost effective, fast, reliable and less maintainable. Building such systems for big data setup is challenging task specially, for start-ups. The paper proposes a baseline serverless large scale end-to-end batch log processing architecture for data analytics and modeling followed by a case study. The four-layer FaaS architecture is effective, low maintainable, inexpensive and shall be setup leveraging any public cloud. A part from typical serverless advantage it also aids in data management and user profiling.
Big Data Processing, Cloud Computing, Serverless Architecture, Batch Processing, Public Cloud
Alberto F. de Oliveira Jr1*†, Marcelo Querino Lima Afonso³†, Manuel Lemos1,Noel Lopes2, 1 - Universidade da Beira, erior, R. Marquês de Ávila e Bolama, 6201-001, Covilhã/Portugal, 2 – Instituto Politécnico da Guarda, Av.ª Dr. Francisco Sá Carneiro, n.º 50, 6300-559, Guarda/Portugal, 3 – Universidade Federal de Minas Gerais, Av. Pres. Antônio Carlos, 6627 – Pampulha,31270-901, Belo Horizonte – Minas Gerais/Brazil
Together Oxytocin and Vasopressin set the neurohypophysial hormones that form a family of structurally and functionally related peptide hormones. However, the biological function of these proteins may vary depending on their taxonomic classification. In our study, using a broad of bioinformatics and machine learning techniques, we described the role of sets of coevolved amino acids in determining the taxonomic classes of neurohypophysial hormone sequences. Withal, it would be possible to correlate that certain taxonomic classes can still be classified from the presence of specific amino acids from these coevolved sets, bringing more light around how the molecular evolution can describe the structure and function.
oxytocin, vasopressin, evolution, coevolution of amino acids, coevolved sets, machine learning, molecular phylogeny, neurohypophysial hormones.
Sabrina Luftensteiner1 and Michael Zwick2, 1Software Competence Center Hagenberg, Hagenberg, Austria, 2Software Competence Center Hagenberg, Hagenberg, Austria
Recently, the amount of available data from industry processes is heavily increasing. This trend is caused by higher rates of machine equipment regarding sensors, which produce continuously data used for further analysis and processing. This paper proposes a framework for improving offline learning models through the usage of such online data, focusing on the minimization of catastrophic forgetting arising in online learning scenarios. The framework incorporates several state-of-the-art methods in deep learning and machine learning and enables simple comparisons between proposed methods. The methods range from memory-based approaches to methods for loss calculation and optimizers in deep learning. The proposed framework is specifically tailored for regression problems in the industrial field. It can cope with single as well as with multi-task models and is easily expandable. Furthermore, it enables various configuration possibilities regarding adaptations to a given problem.
Online Learning, Catastrophic Forgetting, Regression, Domain Adaption.
Sareh Aghaei and Anna Fensel, Semantic Technology Institute (STI) Innsbruck, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
Finding similar entities among knowledge graphs is an essential research problem for knowledge integration and knowledge graph connection. This paper aims at finding semantically similar entities between two knowledge graphs. It can help end users and search agents more effectively and easily access pertinent information across knowledge graphs. Given a query entity in one knowledge graph (first KG), the proposed approach tries to find the most similar entity in another knowledge graph (second KG). The main idea is to leverage graph embedding, clustering, regression and sentence embedding. In this approach, RDF2Vec has been employed to generate vector representations of all entities of the second knowledge graph and then the vectors have been clustered based on cosine similarity using K medoids algorithm. Then, an artificial neural network with multilayer perception topology has been used as the regression model to predict the correspondent vector in the second knowledge graph for a given vector from the first knowledge graph. After determining the cluster of the predicated vector, the entities of the detected cluster are ranked through sentence-BERT method and finally the entity with the highest rank is chosen as the most similar one. To evaluate the proposed approach, extensive experiments have been conducted on real-world knowledge graphs. The experimental results demonstrate the effectiveness of the proposed approach.
Knowledge Graph, Similar Entity, Graph Embedding, Clustering, Regression, Sentence Embedding.