Alexa is a terrible doctor

We become accustomed to technology so quickly, it’s easy to forget that Amazon’s Alexa was a pioneer just six years ago.

The idea was to introduce into people’s homes and offices a personal assistant so knowledgeable, so available, and requiring such minimal human effort that she would become ubiquitous. The utterance of a single word—her name—causes Alexa to spring into action. Without opening a laptop, tablet, or mobile device, we can now set reminders for ourselves, shop, and get answers to simple but essential questions like the weather as we go about our morning routine. Alexa, and the family of devices that rely on her, are programmed to become smarter as we use them, learning to recognize our voices, speech patterns, and usual requests.

A massive success, Alexa is now in some 20 million American homes. Experts estimate Amazon accounts for more than 70% of the US market for “smartspeakers” equipped with a voice user interface (VUI). As familiar members of the household, Alexa and her ilk have been trusted with tasks as essential as ordering our Ubers and as personal as singing our children to sleep. She is efficient, unbiased, and reliable—basically, all the qualities of the best human personal assistants, without any of the worry that comes from someone knowing the particulars of your life.

Given all that, it seems like Alexa would be the ideal assistant to handle questions about perhaps the most important, and most personal, subject matter: our health. But on this front, she’s got a lot of learning to do.

An evolution from search to spoken word

Looking up your symptoms ahead of scheduling a doctor’s appointment makes sense: Why waste time and money if it’s just a small cold that can be treated at home? So it makes sense that a Pew Research Center survey from 2013 found that over a third of people living in the US had used the internet to self-diagnose at least once, and a quarter of people living in the UK did the same. Globally, about 1% of Google’s total queries in 2016 were related to health.

Of course, consulting the internet is nowhere near as good as talking to an actual physician about your symptoms. It’s difficult to get an answer for any kind of complicated question online—and if you do, there’s no guarantee it’s right. Many of the internet’s answers to health-related questions tend to be far-reaching or vague. They give an overview of the possible causes of a symptom, which can include unlikely conditions like cancer—even when an actual diagnosis to confirm such an affliction would take comprehensive testing. It’s easy to get freaked out, and taking an actual trip to the doctor’s office is an expensive way to calm that worry. Even if health care providers don’t suspect the same diagnosis as their patient, they’ll likely order tests to definitively rule it out. It’s a signal to their patients that they’re taking them seriously, and ensures patients leave satisfied.

On the flip side, given their lack of specific answers, search engines might miss an important symptom that really should give you cause for concern. A sharp pain in the stomach could be anything from menstrual cramps to irritable bowel syndrome. If you routinely use Google searches to pass off IBS—a chronic disease that requires lifelong treatment—as bad shrimp, you’ll eventually wish you had gone to a doctor sooner.

Googling symptoms is so common there was no way Alexa wouldn’t include some sort of search functionality. It’s not clear, though, that she can compete with the tools on the internet, as problematic as those are.

If you’re seeking an answer to a health question on a graphical user interface-based operating systems—the screen-based systems like Android, iOS, and Windows that you probably use every day—you’re likely to use a search engine. That’ll bring up thousands of options for you to scroll through. Even though the results are organized by an unseen algorithm, sourcing for any given piece of information is probably accessible (and if it’s not, you’ll see the lack of sourcing, and can then choose to ignore that answer), and you can cross-check answers across multiple sites.

Alexa works differently. Although she is wonderfully helpful with basic questions, her knowledge of the complex medical world is limited, and Alexa generally only serves up one answer. You can’t flick between a dozen browser tabs to determine which strain of tropical parasite that is (probably not) besieging you.

There are two ways to ask Alexa about your health issues. First, you can ask her built-in search tool, in which case she will cite one of Amazon’s “verified trusted sources,” which, according to a spokesperson from the company, includes Stats.com, IMDb, Accuweather, Yelp, Answers.com, Wikipedia, and WebMD. Although neither Alexa nor Amazon says where the artificial intelligence finds each of her answers to your health questions, we can reasonably presume she’s getting her health information from WebMD.

Alternatively, you can install and then put your question to a “skill,” the Amazon VUI equivalent of a smartphone app. There are now approximately 1,000 health-related skills on Amazon available for download, most of them free, and ranging in quality; they are cumbersome at best, and peddlers of pseudoscience at worst.

Quartz visits the Alexa skills store

Trusted sources like Vanderbilt University, MayoClinic, and the Boston’s Children’s Hospital all have skills available in the Alexa store, but others are dubious, and don’t clearly disclose who or what is behind them. In February, when we first began reporting this story, we reached out to Amazon to ask them about the approval process for health-related Alexa skills, and how, if at all, the company regulates them. A company spokesperson responded by asking for examples of what kind of skills we wanted to know about, so we sent over a list of 19 we were testing out at the time.

Amazon told us anyone can develop their own and submit it for review. The company declined to disclose how its Alexa-store review process works, but did say it requires that all skills meet Amazon’s written developer policies. For health-related skills, that means three things:

Skills may not collect personal medical information from customers

Skills cannot imply that they are life-saving through their names or descriptions

Skills must come with a disclaimer that they are not “medical advice,” and that users should ask their health care providers if they believe they may need medical attention

Failure to include one of these disclaimers gives Amazon the right to remove your skill from the library. (Amazon declined to say whether or not they’ve ever actually done this.) Some skills also append a (voluntary) caveat to the required disclaimer: that their information may be false or for “entertainment purposes only.”

In February, Amazon reviewed the skills we sent them and discovered that several did not include disclaimers. For a brief period, they made those skills not searchable in the Alexa skills store while they relayed the issues to the skills’ developers.

A few months later, in mid-April Quartz analyzed all 915 skills published under the “health and fitness” category in the Alexa skills store, and identified 65 medical skills without a disclaimer. We also revisited the 19 skills we had initially sent to Amazon. As of publication, 18 were still live in the skills store, and all except one did, in fact have a disclaimer. The one, which had been removed entirely, was Herbalife, a Los Angeles, California-based multi-level-marketing scheme that sells nutritional supplements and weight-loss products. The one that’s still live without a disclaimer is Medical Symptoms, which Amazon had made unsearchable since February.

On July 2, Quartz sent Amazon the complete list of the 65 skills missing a disclaimer. The company’s spokespeople repeated what they had told us in February: While Amazon doesn’t control who uploads skills to the library, a spokesperson says, it “routinely audit[s] Alexa skills, and if they are not compliant, we work quickly to communicate with the developer and take action on behalf of customers.”

When asked when Amazon had performed its most recent audit of the skills store, the spokesperson told Quartz that all skills are audited on an ongoing basis.

While Amazon does not actively monitor these technologies, by forcing every Alexa health skill to include a disclaimer, the company is (in the US at least) legally blameless for any problematic advice those skills give out. If someone were to sue the company for issues stemming from a skill’s medical guidance, “Amazon would assert the defense that the customer had assumed the risk,” Paula Berg, a law professor at the City College of New York, said in an email.

As of July 10, the 65 medical skills still stood without a disclaimer. Some are relatively innocuous, like My Toothbrush, which “guides you through a timed session focusing on all the different areas of your mouth, so you don’t miss any spots.” Others, however, clearly offer medical advice. For example, MS Awareness Facts “allows you to find helpful information on MS and its symptoms with potentially helpful suggestions,” while the developers of Chronic Diseases Tipsdescribe it as “built for helping chronic disease patents [sic] or people would like to have proper knowledge of chronic diseases.” Anotheradvertises itself as a source of information about suicide.

Does this hurt?

Even the skills that do have a disclaimer can be problematic. If you download a skill and activate it on your home device, you would presumably see what company or organization is behind the tool and, theoretically, any medical disclaimer. But many people—reasonably—won’t do that due diligence. In addition, anyone in a household can use an Alexa skill regardless of who installed it, and there’s no guarantee that one family member will understand or question the sourcing of a skill downloaded by another. It’s even less likely that someone who didn’t install a skill would go back to the Amazon store to read through its disclaimers.

And the disclaimers often mislead users about what the skills actually do. In order to learn more about the range of health-related skills in the Alexa store, Quartz took all 915 and organized them into four groups:

Health care: Skills that provide basic information regarding health or health care. These will tell you what chemicals are in a particular medication, for example, or how a certain type of insurance works.

Trackers: Skills that serve as health trackers. Some will remind you to take your medication; others will let you know how many carbs you’ve eaten, or how many times you’ve used the bathroom (assuming you’re willing to tell Alexa these details about your life).

Guided activities: Skills that provide guidance for things like meditation, cardio, and core-workout sessions.

Diagnostics and treatment: Skills that claim to diagnose you or suggest treatment options. These include symptom checkers that say they can distinguish between a cold and the flu, diagnose and offer advice on how to treat minor ailments, and tell you how to perform CPR, to name a few examples.

Despite disclaimers saying otherwise, the Alexa skills we categorized as “diagnostics and treatment” are clearly dishing out advice. If you ask a question about your symptoms, and a skill’s response includes a diagnosis, are you really being “entertained”?

Quartz tested 16 of the skills in this category by asking them each 14 questions about theoretical common health symptoms impacting the body from head to toe. (There were 19 skills total in “diagnostics and treatment,” but three were tied to a specific paid product that required a login, so were excluded from our final analysis.) We then asked two independent physicians to evaluate the responses.

Of these 16 skills, only 10 were able to give at least one valid answer. In general, the most detailed prognoses were for flu symptoms, sore throats, and diarrhea. Questions related to less common symptoms got fewer responses.

Interactions with these skills all follow one of two patterns: In one scenario, the Alexa skill identifies keywords as you speak, performs a database search for those keywords, and spits out everything related to the keywords from its database. Nine of the 16 skills we tested fell into this category. In the remaining seven, the voice assistant engages you in a conversation, asking a series of follow-up questions to narrow your issues down to the most likely diagnosis, and then provides treatment options.

In the first interaction pattern, the skill starts by attempting to pull out the right keyword from a user’s question. Sometimes, this is a simple task. When we asked “How do I know if I have the flu?” seven of the nine skills were able to identify “flu” as the right keyword, then find it in their databases and come up with reasonable answer.

The other two, however, failed to answer even this seemingly straightforward question. When we asked the Mayo Clinic skill this query, Alexa went off the rails, droning on about “food or water contaminated by bacteria” before we commanded her to “STOP!” Further testing revealed that though “flu” was somehow not in Mayo Clinic’s database, “stomach flu” was—but how would Alexa know to distinguish at this level of granularity?

Then there was Zana AI, which seemed to be more capable of recognizing conditions in birds than humans. It responded to our flu question with: “Bird flu or avian flu is an infectious virus that spreads among birds. In rare cases, it can affect humans.” When we asked Zana, a Germany-based AI company, why their Alexa skill gave us “bird flu” information, its cofounder and CEO Julia Hoxha told us that the keyword “flu” yields four possible entities in its database. “Bird flu”—the first entity in the database search—was the only response returned by Alexa during our test.

This was a pattern: More often than not, the database-driven skills had difficulty picking out the right keywords from our 14 questions. We tried to get around the failures by rephrasing the questions, and found that to get anything useful, we had to be very specific when describing symptoms. Asking “Why does the left side of my head hurt?” yields far fewer answers than asking specifically about having a “headache.”

In other cases, it seemed the skills were papering over their poor research skills with long, jargon-filled responses. When we asked Virtual Nurse “Why does the left side of my head hurt?” the “nurse” rambled on for five minutes, listing out every possible cause, including tension headaches, painkiller-medication-overuse headaches, and migraines, among others, with treatment advice for each. By the time we got to the end, there was no way we could remember the options at the start—it felt like Alexa had just recited the entire “headaches” section of a medical-school textbook.

Adam Coley, the managing director of Lowaire, the company behind the Virtual Nurse skill, says the company is accepting user requests to add new medical questions not currently in the database. More than 900 subjects have been added so far, and Coley says the Virtual Nurse’s ability to understand the arbitrariness in human language is improving daily as they feed the AI more data.

Then there were the skills that attempted to converse with us in a way that mimicked real-world doctor-patient interactions. Sometimes this worked: The Flu Tool skill, designed by researchers at Vanderbilt University, was able to correctly diagnose “common cold” after asking us just eight questions.

Other times, these conversational skills weren’t nearly so smooth. The GYANT skill, developed by a San Francisco-based startup of the same name, for example, had us answer 25 questions before offering a diagnosis. In addition to symptom-related queries, GYANT also asked for a number of personal details, including age, gender, location, and prior medical history: “Do any of the following apply to you?” it asked, “HIV-positive, cancer, chemotherapy, organ-transplant, damaged or no spleen, chronic steroids for treatment, multiple sclerosis?” Besides having to navigate this confusing line of questioning, we received no indication of how many queries were left to answer. We almost ran out of patience before it reached a diagnosis.

The final results weren’t particularly useful, either: “Here are the possible causes I’ve found. You can ask me more about any of them. Common cold, acute upper-respiratory infection, or viral pharyngitis.” Two of those three results were pretty technical terms with no further explanation—and how are you meant to parrot back a medical phrase you’ve never heard and can’t pronounce? A Google search would have taken much less time for clearer results.

We asked some of the developers of these skills why we’d gotten such seemingly bizarre answers. In some cases, we were told the skills weren’t designed to answer every question we asked. “The skill was designed with the first-aid topics scope in mind, and currently does not include responses to common viral infections such as influenza,” wrote a spokesperson from the Mayo Clinic. Although she wasn’t sure about the stomach flu response in particular, the spokesperson said it must have been considered an acute symptom of an ailment based on our language. It’s not clear why that would matter to someone concerned they might be sick with the flu.

Dr. Alexa vs. real doctors

Our tests suggested Alexa health skills provide mediocre health advice in the best-case scenarios. The physicians who reviewed our analysis agreed.

Cate Mackenzie, a family medicine doctor at the Dr. Everett Chalmers Regional Hospital in New Brunswick, Canada, noted in several cases that the treatment advice offered by the Alexa skills was generally fine, but could be dangerously unsuitable for specific groups of patients. WebMD suggested taking an “over-the-counter pain reliever like acetaminophen or ibuprofen” as a treatment for a migraine, for example; Mackenzie pointed out that “there are some patients who should never take acetaminophen or ibuprofen.” Depending on allergies or pre-existing conditions, these medications could cause complications, rather than making them better. (That’s information you can find fairly easily in the migraine treatment section of WebMD’s website.)

Tian Wang, a neurologist at Georgetown University, also found the answers dismal. “These answers all sound like they just extract information from Wikipedia (which contains a lot of incorrect information) using very simple ‘yes’ or ‘no’ algorithms,” he said in an email. “Based on my judgment, these are all bad responses. If this is the best that Silicon Valley can offer, then it will still take a long way for the robotic machine to take over our jobs.”

Both Mackenzie and Wang were especially troubled by the GYANT skill. Unlike other skills that spit out either a description of a diagnosis or of a possible treatment, GYANT’s detailed questioning gives off an air of authority that users could mistake as expertise.

“At first glance, GYANT seemed smart. However this is the worst type of patient encounter,” Wang says. “During medical school and residency training, we were repeatedly told the worst type of patient encounter is asking suggestive, leading ‘yes’ or ‘no’ questions, as this will lead to the wrong way and cloud your clinical judgment. Open-ended questions are preferred, particularly ones like ‘tell me more about your pain,’ because they don’t suggest where or how a person’s pain should feel.”

Instead of drawing from experience like a doctor does, GYANT uses data to decide on the best questions to ask, according to Pascal Zuta, the co-founder of the company behind the skill. “We work a lot on asking questions the right way,” he says, noting that the company looks at the answers each question elicits and compares that “against statistical data on symptom or condition prevalences” in order to figure out which questions its skill may not understand well.

None of the skills we tested can effectively handle open-ended questions (like “Why does light make my headache worse?”), though Zuta says GYANT recently released a feature that allows users to respond freely and records the responses for future training purposes. The company’s AI, Zuta suggests, may be able to help teach these skills to pick up on patterns from responses to open-ended questions, if the datasets are big and comprehensive enough.

But even if a skill could learn to answer medical questions accurately, that wouldn’t mean doctors are out of a job. The disclaimers in skills’ descriptions should remind users that the skills are not to provide doctor-sanctioned advice.

Perhaps the biggest advantage doctors have over any current technology is the physical interaction inherent in an office visit. Seeing a patient in person prompts them to ask questions machines would never even consider—and not to ask pointless or misleading questions that machines might.

For example, a patient might come into the doctor with a persistent headache. The doctor could ask the patient if they’re tired, and they may say no. However, in the doctor’s office, other factors like their physical appearance could lead the doctor to assess certain vitals, such as blood pressure. Those readings, in turn, combined with the headache complaints, could lead the doctor to diagnose the patient with a sleep disorder. The same conversation with an Alexa skill might lead the AI to decide to shelve “sleep” away as a non-problem.

Where Alexa could actually help

In her current state, it’s easy to see how Alexa could play a meaningful role in health care today—it just not in the diagnostic field. We know artificial intelligence is capable of keeping track of patterns of speech, and it’s not a major leap to imagine voice assistants that could be programmed to ask and learn about their owner’s sleeping, eating, exercising, and language usage patterns, in order to flag when something is wrong. Ideally, the AI could prompt you go to the doctor, or it could connect straight to the doctor’s office and make the appointment for you.

That, however, would require structural changes in how Alexa and her skills function. At the moment, Alexa isn’t compliant with the Health Insurance Portability and Accountability Act (HIPAA)—which means that in the US, she can’t legally share your health data with anyone, not even your doctor. In June, Amazon launched a service called Neptune to help other organizations build secure apps. Although Amazon declined to comment on Neptune, the current policy on its Web Services page says that it “aligns our HIPAA risk management program.” So maybe changes are coming that would make Alexa valuable in the preventative health care space.

In addition, Amazon appears to now be quietly cultivating an actual health team to work on its voice assistants (the company declined to comment on what specific products the team will work on). If Alexa skills were capable of handling actual conversations—and not just picking up on keywords to generate a response or to find the next branch in a decision tree—it’s possible they could have a place in at-home diagnostics in the near future. They may even one day be a welcome alternative to seeing a human physician, especially about issues related to something people feel is embarrassing, like their sexual health. After all, you might be more honest talking to a non-judgmental machine instead of a human, which could lead to more accurate results. “I could really see this being helpful for those people who think they need to come to the ER for athlete’s foot or STI testing,” Mackenzie says.

For now, though, these skills really are nothing more than entertainment—and they’re not very amusing at that.

Originally published in Quartz by Katherine Ellen Foley and Youyou Zhou