The choice of microphones is essential to spoken dialogue systems. Furhat supports several types of microphones; mono-microphones (for example table-top microphones), stereo microphones (for example singstar microphones) or Kinect array microphone.
For the development server, the built-in microphone of a laptop can work, but a headset is advised.
Types of microphones
If you don't need to separate the speech of individual users, a single microphone is enough. If you test using a headset, or the built-in microphone in your laptop, this is the option you would use.
If you have multiple microphones, you should use the stereo option where you get to select two microphones. Furhat will then be able to differentiate between different user's speech using these.
If you have access to a Kinect, you can use it as a Microphone by selecting this option.
Setting the speech level threshold
The threshold of the microphone is an important setting in order for the system to know when someone is starting and stopping to speak. If the setting is to high, Furhat will not register that a user tries to speak. If the setting is too low, he will register speech too often which will make the application seem unresponsive.
Note: you can monitor the speech level indicator bars on the microphone or dashboard pages. If they are yellow, speech is picked up. If they are green, the sound level has not reached the level to activate the listening.
Currently Furhat supports the Microsoft Kinect, an affordable camera/IR/microphone-sensor, and the Intel Realsense SR300 camera.
Make sure Kinect is plugged in and positioned so that it can capture the interaction space in front of Furhat. Restart the Furhat. When Livemode starts again, you should have a Kinect menu item on the left menu. Go there to test the Kinect and set it's position so that Furhat knows how to map the users position relative to the Kinect.
For installation on a dev-machine, follow these instructions. Note, you need to have a Windows computer to install the Kinect SDK.
Stand in front of the Kinect and make sure you are clearly visible. When Kinect starts to track you, the face and hands will be marked in the camera view. Then you should also appear in the situation view and the system should start interacting with you. If you have Kinect V2, you head pose will also be tracked.
To test an application using a visual sensor on a development machine without a Kinect, you can use virtual users. To toggle them, go to the Dashboard and enter the situation view and add virtual users.
In the web interface Kinect page, you can set the physical position (in meters and degrees, relative to Furhat) of the Kinect, so that Furhat knows how to parse the coordinates received from Kinect and map it to his own interaction space.
If you want to use Kinect as microphone, you can set this up on the Microphones page.
Beware that Kinect is meant to mainly capture bodies and not faces, also it's microphone array is not very great. Specifically, it is not good at picking up short single words like "five". Longer phrases usually works better.
Your furhat may not have the software required for Realsense SR300 cameras. If this is the case, when you run Furhat with a Realsense camera connected, you will get a message on the console informing you that Realsense software has not been installed. To enable Realsense cameras, download and install the RealsenseSDK on your furhat. Turn the furhat off and on after installing the Realsense software.
Open up the wizard menu on the Furhat webserver. Have someone stand in front of the camera. A green rectangle will appear around their face. If it does not appear, have the person go within the cameras optimal range of 1 meter.
Open up the Realsense tab on the wizard menu. Click the toggle button under 'Realsense'. Additional data will now appear in the camera panel. The additional buttons on this interface allow you to hide the information you do not want to see on the camera panel.
Dotted on their face will be all the landmarks the Realsense camera tracks. They will appear white if the camera has a good recognition on the user, and red if the camera is not confident that the landmarks match the user's face.
On the right a list of expressions will appear, with a value ranging from 0-100 . Test smiling at the camera will. The expression value will increase.
Top-left of the rectangle is the user's pulse. Pulse tracking loses accuracy rapidly when the user is further away from the camera. A pulse of 0 or -1 means that Realsense has not yet determined the user's pulse.
In the web interface Realsense page, you can set the physical position (in meters and degrees, relative to Furhat) of the Realsense, so that Furhat knows how to parse the coordinates received from Realsense and map it to his own interaction space.
The Realsense camera is best at short to medium ranges, and will start losing users completely after 1.8 meters. Capture of users expressions is best when they are facing the camera.