blog




  • Essay / Literature Review on Device Control for Smart Homes Using Voice Recognition via Mobile Phone

    IntroductionIn our daily life, there are several devices which are controlled by remote controlled devices. It is difficult for elderly and disabled people to recognize remote controls of different devices because most of them are the same shape and size. If we take the example of visually impaired people, it is first difficult for them to recognize which remote control belongs to which household appliances, and then to recognize the target device which is assembled with the household appliances. After reviewing many articles, I found that a solution to this problem can be achieved with devices controlled using gesture and voice recognition. A solution that comes to mind is the Smart Home. Yes, it's a great concept to implement, but can you imagine what the cost would be? This is a great innovation but it will be somewhat limited to the Elite Group. So if we want real transformation, we need to make it accessible and affordable for every economic group. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get an original essay There are many possible solutions to this problem. But these types of houses are still not very popular? The reason is very simple: cost! Instead of making a new product, we can make it much cheaper using existing resources. Nowadays, mobiles are the most common gadgets that we have at home. Can make good use of this resource as it already comes with a microphone. Literature Survey Personalized Speech Recognition on Mobile Devices, Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Yeah Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada. In this paper, the authors have done research work on creating such a recognition system which is accurate, low latency with small memory and computing footprint which will help to run faster on Android devices. Quantification was performed on long short-term memory (LSTM) with connectionist temporal classification (CTC) which can directly analyze and predict phonemic targets. Here, its memory size has been reduced by an SVD-based compression scheme. The basic concept here is that of quantized deep neural networks (DNN) and re-evaluation of the language model on the fly to achieve real-time performance on modern smartphones. In the paper, small memory size and computational constraints result in performance and word error rate (WER) latency using long-short-term memory recurrent neural networks (RNNs). (LSTM), trained with connectionist temporal classification (CTC) and state. Minimum level Bayesian risk techniques are very valuable and very precise. LSTMs are made small and fast enough by quantizing the parameters to 8 bits, using context-independent (CI) phone outputs instead of more context-dependent (CD) phone outputs, and using Singular Value Decomposition compression ( SVD). Acoustic models are trained on 3M hand-transcribed anonymized utterances extracted from Google voice search traffic (approximately 2,000 hours). All models are trained using asynchronous distributed stochastic gradient descent (ASGD). To improve robustness to noise, they generated training data “multi-styles” by distorting each training utterance using a room simulator with a virtual noise source, to generate 20 distorted versions of each utterance. They mined YouTube videos and environmental recordings of everyday events for noise samples. To further reduce memory consumption, they compressed acoustic models using projection layers between the outputs of an LSTM layer and the recurring and non-recurring inputs of the same and subsequent layers. layers. Adapting the acoustic models to generate multi-style training as described above yields an additional relative improvement of 12.8% over SVD compression. Since the acoustic model of the floating point neural network of 11 .9 MB consumes a significant portion of memory and processing time, quantizing model parameters into 8-bit integer-based representation had an immediate impact on memory usage, thereby reducing the footprint of the acoustic model at a quarter of the original size. The final footprint of Acoustic Model was 3 MB. For on-device language modeling, the focus is on creating a compact language model for dictation and voice commands. By keeping a small system footprint, they formed a single model for both domains. They also limited the vocabulary size to 64 KB. The language models are trained using unsupervised voice recordings from the dictation domain (∼100M utterances) and the voice command domain (∼2 M of statements). This design of a compact large vocabulary speech recognition system can work efficiently on mobile devices, accurately and with low latency. This was achieved using a CTC-based LSTM acoustic model that predicts context-independent phones and is compressed to a tenth of its original size using a combination of SVD-based compression and quantization. For efficient decoding, we use an on-the-fly rescoring strategy followed by additional optimizations for CTC models that reduce computation and memory usage. Combining these techniques creates a system that runs 7x faster than real-time on a Nexus 5, with a total system footprint of 20.3 MB. Remote control system for home appliances using voice recognition, Noriyuki Kawarazaki and Tadashi Yoshidome.In this paper, we mainly developed a remote control system for electrical appliances using voice recognition. The remote control system is composed of PMRC, PC, microphone and speaker. PMRC is a programmable multiple remote control which is used here and can memorize the functions of many remote controls. PMRC is a device that can perform the task of multiple remote controls at once. It has infrared LEDs mounted all around. These infrared LEDs sent infrared signals in all directions. The user therefore does not have to worry about the position of the PMRC. When a user gives the voice command to the system, the PMRC sends the infrared ray signal to the home appliance. Then the system transmits the text-to-speech message to the operator and many voice commands for remote control operations depend on sentences in order to have a user-friendly interface in the system. To do this, use voice recognition software and morphological analysis software to recognize voice commands based on the sentence. “Julius” is a famous free speech recognition software intended for researchers. “Mecab” segments Japanese sentences into sequences of.