Express Yourself: Extracting #emotional analytics from #speech
Posted on March 7th, 2016
03/07/2016 @WeWork, 69 Charlton St, NY
Yuval Mor & Bianca Meger @BeyondVerbal talked about the potential applications for their product. BeyondVerbal produces software, including their Moodies smartphone app, that assess one’s emotional state through the intonation of one’s speech.
They take 13 second vocalizations (excluding pauses) and report the speaker’s emotional state based on 432 combined #emotions (12 emotions x 12 emotions x 3 levels of energy), based on 12 basic emotions (which can appear as pairs: 12 x 12) times 3 levels of energy (pushing out/neutral/pulling). They also monitor 3 indices: arousal, valence (pos/neg), temperament (somber/self-controlled/confrontational).
The software can be tricked by actors (and politicians) who are proficient in projecting emotions of the characters they play. They do not do speaker separation and are resilient to some types of background noise. Speech after voice compression may be difficult since various frequencies are removed, however, they have improved their ability to analyze youtube clips. They said there were differences in the diagnostic abilities for phonetic languages vs tonal languages, but many characteristics appear to be cross cultural.
They claim to measuring 100 different acoustic features, but did not provide citations to academic research. Their validation appeared to be primarily internal with a team of psychologists evaluating spoken words.
One potential application is in predicting the onset of a heart attack base on one’s voice versus a prior baseline. They are currently conducting this research on 100 patients at Mayo clinic.
#IBM #Watson and #FacialRecognition
Posted on November 16th, 2015
11/16/2015 @Wework, 69 Charlton St, NY
Before the main presentation, Roberto Valenti @Sightcorp talked about his company’s development of face analysis technology. The technology can extract information from up to 50 faces simultaneously including age, gender, mood – facial expression, ethnicity, attention
Future applications could include: home automation, gaming (map to avatar, or use as input), medical, interactive ads in a public spaces.
In the main presentation, Michael Karasick @ IBMWatson talked about the applications and APIs currently offered by Watson:
- Personality API which correlates word usage in one’s writing with the author’s personality.
- Analyze tone of works (email) to target a demographic.
- Respond to questions over the phone
- Control emotional expressions for Pepper, a robot from Softbank
- Vision diagnosis of melanoma
- Chef Watson interprets recipes incorporating your food preferences
- Watson Stories summarizes stories using natural language analysis. Currently it is being refined using supervised learning under the guidance of an internal team at Watson: System receives feedback on the tone, etc.
Is #NUI the next breakthrough in #UI #Biometrics / #Personalization / #Identification?
Posted on September 21st, 2015
09/21/2015 @WeWork, 69 Charlton St., NY
Alexey Khitrov @SpeechPro spoke about his company’s #VoiceRecognition product. The product recognizes a user by his/her voice or voice + facial image. The voice + face system is used by Wells Fargo as one option to log into their mobile app. The system asks you to place your image within the view of your cell camera and then asks you to read aloud four numbers displayed on the screen. The system verifies both image and voice. The procedure replaces logins needing passwords.
The processing is split between the phone and server, but will eventually reside on the phone alone. The system examines the image for “liveliness” and matches lip movements with the number spoken. Both use a variety of technologies to arrive at a verification score that can be passed to the server.
The Amazon Echo does not have a camera and login requires voice input. For this, SpeechPro has developed a system which monitors continuous speech patterns and gives an evaluation for the conversation as a whole and the conversation over the last 5 seconds. In this way, the system continually verifies that the speaker is a valid user and if there is a change in speakers during the conversation.
Lastly, Alexey talked about a further layer of security they are implementing. Instead of displaying the four digit code that the user reads aloud, the digits will be hidden in a 5 x 5 matrix of numbers. Only the user will know the cells within the matrix containing the digits requested.
In summary, this is a product that could make it much easier to access your accounts on the internet as well as increasing account security. As this methodology becomes more popular, the security of the technology will be tested by new methods of attack.
“Sweet Talkin’ Woman” Exploring IPsoft’s #Amelia, an Artificial Agent
Posted on June 15th, 2015
06/15/2015 @WeWork, 69 Charlton Street, 8th floor, NY
Adam Pease @iPsoft presented his research on natural language processing (#NLP). He emphasized the importance of understanding the semantic structure of natural language and the limitations of the “bag of words” approach to automated language understanding.
IPsoft creates an artificial agent to automate customer service agents. It conducts a dialogue with users and is designed to handle step-by-step protocols for specific questions: e.g. for mortgage monthly calculations, but has the flexibility to engage in small talk and access databases and large corpora background knowledge. It also is trained to express its level of certainty when presenting an answer.
If Amelia cannot answer a question it will escalate to a human and then listen to the answer as it attempts to learn the solution when it faces similar inquiries in the future.
The next generation system will have an emotion model and a dialogue model. It could determine the emotion of the user, but a theory is still needed on how to handle the different emotions of users.
From this point, Adam emphasized the theory behind his NLP research emphasizing his work on ontology. He first reviewed some of the traditional information retrieval methods used by Amelia:
- Term freq/inverse document freq – greater importance given to infrequent terms
- BM25 – two phase. Retrieve docs, then relevant sentences from the document
- #Word2vec – multidimensional vectors of docs. Can train on words and the conjunction of words from documents. Uses multifactor analysis to avoid overfitting the data.
He then talked about methods for mapping the semantic structure of sentences and why these approaches are important: they have the potential to creating novel answers to problems.
The Stanford dependency parser graph extends the idea of sentence diagramming to create structures that can determine that “John walks to the store” is the statement that answers the questions “Who walks to the store?”. This is done by matching the node “John” with the node “who” and is robust to non-structural variations such as “amble” replacing “walk” in the original sentence.
Wordnet is one approach to creating the framework for these structures. George A. Miller at Princeton started its development 20 years ago. It is an electronic dictionary containing 100,000 hand-created word senses and semantic links.
Adam then talked about his research on ontology (“Suggested Upper Merged Ontology” contains 20k items, 80k axioms, linked with fact databases.) which relates higher order logical theory to the theory of language.
Related technologies include
- Expert systems – worked, but not visionary. Not reusable in the next project
- Semantic networks – match graph structures
- Semantic web – restricted representation.
More than words: why can’t machines learn to converse
Posted on March 16th, 2015
3/16/2015 @ WeWork, 69 Carlton St, NY
Rebecca J. Passonneau @Columbia University
Rebecca talked about some ways to increase the efficiency of #human-computer spoken dialog. Her main conclusions were computer systems responded to the human queries faster and more accurately when the system concentrated on understanding information that was
- most accessible to the user
- had the highest diagnostic value when querying the database
Becky started by describing some of the characteristics that separated spoken dialog from text queries:
She next described two experiments she conducted to better understand the user interaction when patrons called a librarian to request material in the “spoken book” collection of the New York Public Library. The experiments used these results to test how book retrieval could be facilitated using a better model of the user’s queries and information about the structure of the data in the database.
An example of information the user has at hand would be the author’s last name as opposed to the ISBN number. The database query, would quickest using the ISBN number, but would consider the book title to be more diagnostic than the author’s name.
In addition to her conclusion that the best dialog would be a compromise between what the requestor knows and what the database finds most diagnostic, she talked about how the computer response is improved as the program remembers more of the previous parts of the conversation. So, for instance, a mispronounced author name might not useful at first, but might be the key piece of information once other facts are known.
The experimental results show how different strategies of the program produce improved results, even if the strategies are mutually contradictory.