Gold B., Morgan N., Ellis D. Speech and Audio Signal Processing. Processing and Perception of Speech and Music

Файл формата pdf
размером 10,10 МБ

Добавлен пользователем Shushimora 21.10.2013 17:36
Описание отредактировано 05.07.2017 02:24

Gold B., Morgan N., Ellis D. Speech and Audio Signal Processing. Processing and Perception of Speech and Music

John Wiley, 2011. — 679 p.

Technology moves at a dizzying pace; however, progress can actually seem quite slow in any area that we are deeply involved in. Conference proceedings are filled with incremental advances over previous methods, and entirely novel (and successful) approaches to speech and audio processing are rare. But a lot can happen in a decade, and it has. In addition to quite new methods, there are also many ideas that had not really been refined enough to show progress in the 1990s, but which now are in common use. For instance, Maximum Mutual Information methods, which were developed for ASR many years ago and were briefly described in the previous edition of this book, was significantly refined in the last decade, and the newer versions of this approach are now widely used. Consequently, we devoted new sections of this revision to MMI (and related methods like MPE).
These advances might have been sufficient to warrant an update of our textbook, but there were other reasons as well. A decade of teaching with the book has revealed a number of bugs and deficiencies, and a new edition affords us the opportunity to correct them. For instance, the previous version had nothing about sound source separation, an area that has received considerable attention in the last decade. Approaches to the coding, transcription, and retrieval of music are also now significant areas of audio signal processing, and were not originally covered in the book.
Last, and not least, the new edition has the benefit of a fresh look at the overall subject from our new co-author, Professor Dan Ellis from Columbia University. This hand-off is a key step in keeping the text current.
As with the previous edition, we've attempted to keep the overall style consistent, focusing on what we think is essential, and leaving many details for other publications. We hope that this choice has helped to make the text useful for many readers.

Speech and music are the most basic means of adult human communication. As technology advances and increasingly sophisticated tools become available to use with speech and music signals, scientists can study these sounds more effectively and invent new ways of applying them for the benefit of humankind. Such research has led to the development of speech and music synthesizers, speech transmission systems, and automatic speech recognition (ASR) systems. Hand in hand with this progress has come an enhanced understanding of how people produce and perceive speech and music. In fact, the processing of speech and music by devices and the perception of these sounds by humans are areas that inherently interact with and enhance each other.
Despite significant progress in this field, there is still much that is not well understood. Speech and music technology could be greatly improved. For instance, in the presence of unexpected acoustic variability, ASR systems often perform much worse than human listeners (still!). Speech that is synthesized from arbitrary text still sounds artificial. Speech-coding techniques remain far from optimal, and the goal of transparent transmission of speech and music with minimal bandwidth is still distant. All fields associated with the processing and perception of speech and music stand to benefit greatly from continued research efforts. Finally, the growing availability of computer applications incorporating audio (particularly over the Internet and in portable devices) has increased the need for an ever-wider group of engineers and computer scientists to understand audio signal processing. For all of these reasons, as well as our own need to standardize a text for our graduate course at UC Berkeley, we wrote this book; and for the reasons noted in the Preface, we have updated it for the current edition.
The notes on which this book is based proved beneficial to graduate students for close to a decade; during this time, of course, the material evolved, including a problem set for each chapter. The material includes coverage of the physiology and psychoacoustics of hearing as well as the results from research on pitch and speech perception, vocoding methods, and information on many aspects of ASR. To this end, the authors have made use of their own research in these fields, as well as the methods and results of many other contributors. And as noted in the Preface, this edition includes contributions from new authors as well, in order to broaden the coverage and bring it up to date.
In many chapters, the material is written in a historical framework. In some cases, this is done for motivation's sake; the material is part of the historical record, and we hope that the reader will be interested. In other cases, the historical methods provide a convenient introduction to a topic, since they often are simpler versions of more current approaches. Overall, we have tried to take a long-term perspective on technology developments, which in our view requires incorporating a historical context. The fact that otherwise excellent books on this topic have typically

Historical Background
Synthetic Audio: A Brief History
Speech Analysis and Synthesis Overview
Brief History of Automatic Speech Recognition
Speech-Recognition Overview
Mathematical Background
Digital Signal Processing
Digital Filters and Discrete Fourier Transform
Pattern Classification
Statistical Pattern Classification
Acoustics
Wave Basics
Acoustic Tube Modeling of Speech Production
Musical Instrument Acoustics
Room Acoustics
Auditory Perception
Ear Physiology
Psychoacoustics
Models of Pitch Perception
Speech Perception
Human Speech Recognition
Speech Features
The Auditory System as a Filter Bank
The Cepstrum as a Spectral Analyzer
Linear Prediction
Automatic Speech Recognition
Feature Extraction for ASR
Linguistic Categories for Speech Recognition
Deterministic Sequence Recognition for ASR
Statistical Sequence Recognition
Statistical Model Training
Discriminant Acoustic Probability Estimation
Acoustic Model Training: Further Topics
Speech Recognition and Understanding
Synthesis and Coding
Speech Synthesis
Pitch Detection
Vocoders
Low-Rate Vocoders
Medium-Rate and High-Rate Vocoders
Perceptual Audio Coding
Other Applications
Some Aspects of Computer Music Synthesis
Music Signal Analysis
Music Retrieval
Source Separation
Speech Transformations
Speaker Verification
Speaker Diarization

Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
Регистрация

Смотри также

Подробнее

Benesty J., Sondhi M.M., Huang Y. (eds.) Springer Handbook of Speech Processing

Раздел: Обработка речи → Материалы конференций

Springer, 2008. — 1159 p. The achievement of this Springer Handbook is the result of a wonderful journey that started in March 2005 at the 30th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Two of the editors-in-chief (Benesty and Huang) met in one of the long corridors of the Pennsylvania Convention Center in Philadelphia with Dr Dieter Merkle...

18,16 МБ
добавлен 11.11.2011 11:06
описание отредактировано 31.03.2024 15:25

Подробнее

Park T.H. Introduction to Digital Signal Processing. Computer Musically Speaking

Раздел: Обработка медиа-данных → Обработка звука

World Scientific, 2010. — 352 p. This book is intended for the reader who is interested in learning about digital signal processing (DSP) with an emphasis on audio signals. It is an introductory book that covers the fundamentals in DSP, including important theories and applications related to sampling, filtering, sound synthesis algorithms and sound effects, time and...

7,02 МБ
добавлен 04.05.2012 00:45
описание отредактировано 05.07.2017 04:25

Подробнее

Pirkle W. Designing Audio Effect Plug-Ins in C++: With Digital Audio Signal Processing Theory

Раздел: Обработка медиа-данных → Обработка звука

Focal Press, 2013. XX, 534 p. — ISBN: 978-0-240-82515-1 (pbk), ISBN: 978-0-123-97882-0 (ebk). Not just another theory-heavy digital signal processing book, nor another dull build-a-generic-database programming book, Designing Audio Effect Plug-Ins in C++ gives you everything you everything you need to know to do just that, including fully worked, downloadable code for dozens of...

9,85 МБ
добавлен 17.03.2015 00:48
описание отредактировано 01.03.2016 19:20

Подробнее

Zӧlzer U. (ed.) DAFX: Digital Audio Effects

Раздел: Обработка медиа-данных → Обработка звука

Second edition. — John Wiley, 2011. — 614 p. DAFX is a synonym for digital audio effects. It is also the name for a European research project for co-operation and scientific transfer, namely EU-COST-G6 Digital Audio Effects (1997–2001). It was initiated by Daniel Arfib (CNRS, Marseille). In the past couple of years we have had four EU-sponsored international...

10,39 МБ
добавлен 06.01.2013 03:26
описание отредактировано 15.04.2016 04:29

Подробнее

Zӧlzer U. Digital Audio Signal Processing

Раздел: Обработка медиа-данных → Обработка звука

John Wiley, 2008. — 334 p. Digital audio signal processing is employed in recording and storing music and speech signals, for sound mixing and production of digital programs, in digital transmission to broadcast receivers as well as in consumer products like CDs, DATs and PCs. In the latter case, the audio signal is in a digital form all the way from the microphone right up to...

4,67 МБ
добавлен 06.01.2013 03:32
описание отредактировано 15.04.2016 04:26

Подробнее

Хайкин С. Нейронные сети. Полный курс

Раздел: Искусственный интеллект → Нейронные сети

2-e изд. — Пер. с англ. — М.: Вильямс, 2006. — 1104 с.: ил. В книге рассматриваются основные парадигмы искусственных нейронных сетей. Представленный материал содержит строгое математическое обоснование всех нейросетевых парадигм, иллюстрируется примерами, описанием компьютерных экспериментов, содержит множество практических задач, а также обширную библиографию. В книге также...

18,63 МБ
дата добавления неизвестна
описание отредактировано 21.11.2020 03:52

Главная

Наверх