If you are one of those people who finds it difficult to multitask (I am one of those), you might appreciate this story involving court reporters. I work as a lawyer during the day, and quite often I need to take depositions, which are reported in real time by court reporters who use a special keyboard to take down every word of the deposition. The best court reporters are truly incredible to watch. To be a court reporter, you need to take down at least 200 words per minute without mistakes. You would think that trying to take down every word spoken by everyone in a room would completely occupy your working memory, but good court reporters can do their work proficiently with mental processing capacity to spare.
Last week I spent an entire day taking depositions. After the depositions were finished, I asked the court reporter what she was daydreaming about. She smiled, because she knows that experienced court reporters are perfectly capable of daydreaming about such things as grocery shopping or going to the beach at the same time that they are taking down every syllable of every word spoken in the room.
I asked this particular court reporter how often she has to go back and look at her transcript to see what was being said, because she was thinking about something else at the time she was taking down the testimony. She told me that she was once working for a judge who was going to sentence a man convicted of murder. The big question that day was whether the man would be put to death or whether he would get a life sentence. This court reporter was assigned to preserve all of the court proceedings regarding this momentous sentencing. After she was done taking down the testimony, and after she left the courtroom, someone asked her whether the judge sentenced the accused to death. This woman hesitated before replying that she did not know, even though she was a court reporter. To find out, she went back to her tape (the strip of paper on which the court reporter’s keyboard prints out the testimony), and looked for the critical part. She found out that the judge had actually sentenced the man to death, but she had no memory of this.
I asked her whether she is ever asked to read back testimony during a court proceeding or deposition at a time where she became nervous that she might not have been accurately taking down the testimony. She stated that this never happens, and that she is always confident that she’s taking down the testimony accurately. If something starts going wrong, her full consciousness kicks in and she deals with the unusual situation fully aware. She has never been caught not taking down the testimony accurately.
I find it pretty amazing that someone could have their working memory so thoroughly occupied in the linguistic sense, and yet be able to think about other things. It’s even more amazing that when the court reporters daydream or think, they are often doubly-employing their linguistic abilities. It just seems like this would be impossible, but it’s commonplace.
Most of the court reporters today use a special stenographic keyboard, but there are a few who speak into something that looks like a muzzle. They hear the testimony in the courtroom with their own ears and simultaneously speak those words into this muzzle-device which is recorded by a tape recorder. In short, they “shadow” the testimony with their own voice. Later, someone types out that the court reporter’s words into a transcript. I’ve spoken to some of these muzzle-device court reporters over the years, and they to tell me that they are able to think about other things were daydream while they are taking down the testimony.
If you are wondering why we even have court reporters, that would be a good question. The main advantage is that when you have a court reporter, you have a person who is in a position to swear to the accuracy of the transcript, indicating who said exactly what. A tape recorder would simply record the sounds, and might not accurately pick up the exact words that were being spoken (for instance, because someone is mumbling or gesturing). When these sorts of things happen at a deposition, human court reporters ask the witness to speak up or to state their testimony in words rather than gesturing. This makes for a more accurate and more readable transcript. That said, some courtrooms are now employing tape recorders in lieu of court reporters.
Back in the 1990's it was conclusively proven that our brains multitask at the symbolic level. That is, many things are busily going on while the little piece that runs your conscious awareness only monitors one. The mind can be trained much like the body, allowing reflexes to handle many complex functions without oversight of the conscious mind.
Why is it that we are not amazed that one can walk without intentionally telling each quadricep (and each of the hundred other muscles involved) when to contract and or being aware of all the sensations involved (orientation, rotation, balance, forces, wind, etc), yet it amazes us that we can process specific types of information, as in-the-ear-out-the-fingers transcription?
What amazes me is that my wife can converse while typing something else. At least until she empties the buffer of what she'd read or thought of to write; a couple of sentences. Amazing.
I think a word and it comes out my fingers. But I am limited to a word at a time. If I invested the time to train, I could get to her paragraphic level of typing. Probably. Old dog.
Legacy is the basis of jurisprudence. One example that bugs me is that eyewitness accounts are considered the only legal form of evidence. But it has been thoroughly proven that this is the least reliable form of evidence. My experience sitting on juries is that testimony falls somewhere between
a) An account of what the witness thinks they remember of what they believe they saw, and
b) What they invent and hope you'll believe. As in, "Naw, I ain't told [those other witnesses] what they's all said I did"
Having a stenographer take down every word predates typing or other recording devices. It does still serve the two needs Erich ended up with, prompting for verbal clarity as needed, and providing an impartial witness to the proceedings.
But a Dragonware-like app can almost already do such prompting, and the stenographer is not actually a conscious witness (as Erich described). She can only testify that the words did flow through her to the tape.
Dan: I would add that in addition to oral evidence a second major category of evidence is documentary evidence (including written statements, photographs and business records) which, in fact, drives the verdict in many cases. Other types of evidence are also important (e.g., physical objects, scientific tests), though these things do rely upon human beings to establish a foundation for admissibility.
My experience with voice activation (and I'm a constant user of Dragon), is that we are a long way from relying on it to get an accurate account. As amazing as it is as a dictation tool, 99% accuracy would get a human court reporter fired in a heartbeat. Many of my posts are written through Dragon. I always need to make a few corrections before publishing–and you have been gracious to point out typos that I miss on a regular basis.
Dragon may not yet be perfect in transcription, but it could easily tell when it is having trouble, as in mumbling, indistinct word separations, and overall volume (the causes of "speak up").
Dan: That is often not my experience. When I use Dragon, it spits out the closest fit to the words I utter, and they can sometimes be dramatically different than what I utter. It doesn't display any sort of confidence level–Dragon is ALWAYS confident! The exception would be if I were to cough, at which point Dragon doesn't recognize any text. Any time I say real words, Dragon prints real words (quite often, but not always, the same words I am speaking).
But as I mentioned above, 99% accuracy would quickly get a court reporter fired.
Erich, there is a basic difference between what any software does, and what it shows a user. Internally, Dragon knows its own confidence level, the sound levels, the sound distinction levels, the frequency distributions of each sound, and the frequency distribution and volume of the background noise.
For a consumer dictation program, all it displays is its best guess at written words to match the sounds with as little interruption of the free flow of utterance as possible.
I was not suggesting using off-the-shelf 2009 Dragonware in lieu of a stenographer. I was suggesting that the kernel of the program knows when to ask someone to speak more clearly. This could be used to complement other automatic recording devices such as face-centering video cameras with directional mics for each speaker.
Dan: I can just hear the little computer speaker barking: Speak up, please! Then again, that's what court reporters sometimes need to do.
Erich,
Much recognition software employs an artificial intelligence programming technique known as a neural net simulation.
Neural net simulations run many parallel sub-programs, called nodes, that independently analyze the input and produce a list of possible results. Each node starts with a different list of possible results. Each node votes on which item seems more likely to be the result and percentage of nodes voting for the most popular result. The percentage is the compared to a preset percentage called the confidence threshold level. if the percentage is below the preset confidence threshold level, each node's list is adjusted to favor the more popular result and the input is rerun back through the analysis. This repeats until the confidence threshold level is attained or the voting percentage percentage no longer changes. The actual confidence level score is the ratio of results that meet or exceed the preset confidence threshold level out of all the results processed.
So basically the software can be 99 percent confident that you are saying "hair" when you are really saying "hare".
Back in the 90's I worked with a commercial AI system that read handwritten applications and output test for used in a database system. The system used a 300 MHz cpu that interfaced to a high performance document scanner and a second 300 MHz PC that hosted a custom hardware neural net simulator designed around 4 intel 860 processors (the 860 cpu was a 64-bit high performance pipelined RISC processor, often touted as the "Cray on a chip"). Ten years later, any mid-range PC could do more than the $8000 co-processor board was able to do.
The AI system achieved about 98 percent accuracy with confidence levels set at around 85. Some of the problem handwriting was difficult for our editing staff to read.