Analyzing the Voice of the Patient in Pharma

Listening to patients has become a rich source of real-world data for the pharmaceutical industry.

A European pharmaceutical company created a project aimed at collecting and analyzing the voice of women, who share their experiences about assisted reproductive treatments on internet forums.

Online fertility forums have built communities and safe spaces where people can share knowledge and advice.

They offer rich data and provide access to a wealth of patients’ experiences.

Nevertheless, these online spaces force researchers to rethink established ethical principles of privacy and anonymity. The project complied with privacy principles, regulations used by each forum, and ethical standards that guarantee the anonymization of users.

The company was interested in a number of sources that amounted to over 6 million posts. 

Manual review of this data would take over several years to complete.

Only automatic processing facilitates its analysis with the required quality, response time, and homogeneity.

The project step by step


The process begins with web crawling.  While users can perform this task manually, the term often refers to automated methods executed utilizing a web crawler.

To minimize the negative aspects of having traffic on websites, we accommodated the behavior of crawlers by implementing the Robots Exclusion Protocol as specified by each “robots.txt” file.


 A simple classification model was developed and trained by our computational linguists. It was made up of 10 categories which included classes such as adverse reactions, expectations, benchmarking, and posology and use.

Our NLP engine then performed the following tasks:

  • It classified the 6 million posts into one or more of the categories.
  • It performed the sentiment analysis of the posts. The output is one of the following tags: N+ (strongly negative), N (negative), NEU (neutral sentiment, neither good nor bad, or in cases where positive polarities compensate for the negative ones), P (positive), P+ (strongly positive) and NONE (no sentiment).
  • It extracted the names of brands, molecules, adverse effects, medicines, clinics, places, organizations, and products mentioned in the posts.

Konplik uses hundreds of thousands of domain-specific semantic resources that are integrated into our system, each of them intends to detect a different type of named entity. These resources include MedDRA (the Medical dictionary of regulatory activities), the Unified Medical Language System (UMLS), and International Statistical Classification of Diseases and Related Health Problems (ICD)


We then developed a dashboard that used data visualization to analyze and display information visually in a useful way. The dashboard displays the business’s key performance indicators (KPIs), used to assess performance measures and generate actionable insights. It included

Relevance of clinics

It shows the location of the clinics that have been mentioned by the patients. The map shows further details when clicking on each spot.


Product-related insights

This interface gathers the mentions of brands and their products. It has sections for stock alerts, adverse effects, and drug sharing alerts.


Drug sharing alerts

Whenever patients try to buy or sell prescription drugs the dashboard shows the quotations.


Adverse effects

The system monitors adverse events and recognizes their impact.

The Voice of the patient project ultimately offers real-time insight for both commercial and scientific departments of the pharmaceutical company. They can now analyze key data quickly.

Manual review of this data would take over several full-time years to complete.

Only automatic processing facilitates its analysis with the required quality, response time, and homogeneity.

Contact us!