Text-Mining Analysis Links Drug-Impaired Driving To Higher Injury Rates
“On medication” may not be one of the check boxes on carrier claims forms, but by hunting for such a description in adjuster notes, auto insurers can improve their results, a consultant reports.
Philip Borba, a principal and senior consultant in the economics consulting practice of Milliman Inc., led an analysis of narrative information related to nearly 7,000 auto accidents recently, uncovering a statistically significant relationship between descriptions of drivers’ drug use and the severity of accidents.
Using the incidence of bodily injury as a measure of severity, Borba reports that 73 percent of all auto accidents had a corresponding injury report. But for the nearly 16 percent of accidents where the narrative indicated that a driver was taking medication, the injury rate jumped to 82 percent, he says.
The study also investigated severity ties to words like “on many prescriptions,” as well as references to specific prescriptions and narcotic drugs. The results are summarized in the accompanying chart.
These tidbits of textual information about drug-impaired driving don’t just inform the claims-handling process, but they can also provide valuable input to underwriters, Borba says.
A report summarizing Milliman’s findings about DUID—driving under the influence of a drug—was published in the April edition of Milliman’s Insight publication. The report notes that other carrier benefits of identifying drug-impaired driving include better claims triage and assignment of liability.
“Finding that the other driver was DUID may be cause for a subrogation recovery against that driver, or provide enough additional evidence to increase the likelihood or size of recovery,” the report notes.
Explaining the claims triage benefit, Borba says that if carriers react quickly to any indication of drug use, then they will be able to reroute claims to adjusters who are familiar with different medications and know how to test for those. That’s important because the establishing that someone was DUID is a complex process, he says—much more involved than establishing a DWI.
“For alcohol, they’ve got the test down. It doesn’t matter how you came about the intake”—beer, wine or hard alcohol. “If you are a BAC [blood alcohol content] 0.08 or above, it’s a per se case that you’re driving while under the influence of alcohol.”
“That’s not so clear with drugs,” Borba says, explaining that there is no single test that is appropriate because different drugs metabolize differently in the body. “Per se cases are just harder to identify.”
While it may not be a breakthrough to discover that there is some relationship between DUID and accident severity—or cell phone use and accident severity, which was the subject of an earlier Milliman analysis—Borba highlights a broader takeaway about the power of narrative information.
“There is important information in text data that we [carriers] are not capturing as structured data,” Borba says. “While that information is not going to be the panacea, what we’re trying to do is to improve the claims-handling ability, the loss control [procedures] or the underwriting ability just by enough to make one carrier a little bit better than the other—to provide a competitive advantage,” he says.
“DUID is a difficult condition for insurance claim forms to have coded up, Borba says, adding that “that’s exactly why” he was drawn to the topic.
“We don’t have a lot of boxes, or the boxes are not easy to work with, to check off whether or not we were driving under the influence of drugs, which drugs, or how the test was done.”
“That’s why the accident descriptions become very important in these situations,” he says, going on to describe the process Milliman developed to convert the unstructured data into usable structured data.
He explains that the starting point of the Milliman analysis was actually not claims adjusters’ notes, but accident descriptions from a publicly available database of the National Highway Traffic Safety Administration of 6,949 auto accidents, dating back several years ago. “At the time of the accident, they had a person collecting data on the circumstances of the accident as well as interviewing the people who were involved in the accident.”
These field researchers constructed narratives that are very similar to claim adjuster notes—each consisting of nearly a page of written text ranging from 400-600 words, he says.
Using a computer to scan through the nearly 7,000 descriptions, Milliman broke the narratives down into one-to-six word phrases—3 million of them, which are stored in a database. These could then be searched for any one of 10,000 key phrases that might indicates that someone involved in the accident was using drugs, Borba says.
The Milliman report explains that the key phrases came under four different themes:
- One for identifying the presence of a medication with by joining phrases such as “on many” and “taking pain”
- Another for identifying the presence of “prescription” meds in a similar fashion
- A third theme joining an action, like “was on” or “had taken,” with a specific drug name
- A fourth theme scanning for any one of 52 illegal narcotics, including cocaine, heroin or marijuana.
Finding any of these phrases turned on a flag which would set a binary—or 0/1 variable—to 1 to indicate the presence of medications, Borba says, noting this information could then be merged with all the other structured data Milliman compiled from the NHSTSA database information—the time of day, whether the driver had been fatigued, the weather at the time of the accident, etc.
“Those are things that we already know,” or that carriers typically know from information they code and capture in their data systems. “The point of the exercise was is to say, ‘Given what we already know, does the text data give us more information? Does it provide us with some lift,’” Borba says, using a term from the world of predictive analytics.
“Otherwise, we can just rely upon what we already know. If the time of day, the weather, and the nature of the accident are already good enough, then we don’t care about the text data.”
The particular analytic procedure Milliman used to investigate the predictive value was one known as a “logit analysis.” This is essentially a statistical regression analysis with just two outcomes—in this case “injury” or “no injury.”
Milliman found that adding information about the use of medications or narcotics did indeed improve the ability to predict whether or not an injury occurred.
“Without that information, our predictive ability was about 57 percent,” Borba says, referring to a 0.57 probability of an injury for an accident that occurred in the daytime, on a weekend, in good weather, on a dry surface, and where no alcohol was present, among other conditions.
“When we factored in whether or not a medication, prescription, a particular drug name, or a narcotic was involved, our ability to predict [injury] increased. It got up in the area of 75 or 80 percent,” Borba says.
More precisely, the report shows that under the same specified set of conditions described by structured data—daytime driving, good weather, no alcohol, etc.—the probabilities of accident were:
- 0.75 when the accident description mentioned medication
- 0.72 when one of the drivers was on a prescription
- 0.72 when a specific drug was mentioned
- 0.85 when there was mention of a specific illegal narcotic.
The report highlights an 18 percentage point increase in the probability of injury when accidents involve someone on medication—a predictive lift that might be reason enough for a carrier to undertake the four labor-years it took Milliman to build the database.
In addition, Borba notes that now that the text phrases have been compiled into its database, analyzing the data set takes about two minutes, even for investigations of other predictive variables. “The power of what we have worked out is that if we want to look for different text, we don’t have to go and redo our 13 million phrases.”
“We created a database from the accident descriptions. That’s the way to understand this,” he says.
Carriers with hundreds of thousands of accidents, instead of 7,000, could get even more granular in their research—analyzing results for a specific drug, like Oxycontin rather than a range of drug names, he notes.
“This was a proof-of-concept exercise, Borba says. Applying the same text-mining approach, carriers now have the ability to work retrospectively—to discover claims relationships with items they never even thought about coding into their systems.
“I can go back and take all those claims that happened with Hurricane Sandy and if somebody says we want to know this additional piece of information about it, but we just forgot to code it on our forms—if it’s in the descriptions, I can go back and get that information. It’s a matter of finding the right phrases” in a narrative, Borba says.
Expanding on the potential application of text mining in the property claims arena, he says items like mold, mildew, wind damage, or water damage might be captured in adjuster notes even if they weren’t coded on claim forms initially at the time of loss.
Borba also sees a potential application in working through open workers’ compensation claims. “We can scan through the notes to find out which claims are just waiting for some bit of information to take the next step,” such as a doctor or lawyer’s report, he says.
In addition, efficiencies in auto insurance subrogation are possible by scanning for phrases like “rear-end accidents,” he says.
Borba notes that the process demonstrated by the DUID analysis can extend beyond the examination of claims adjusters’ notes. “We can do this with memos, with emails, any kind of text data,” he says.