# Business Context
## Unstructured Clinical Notes
An estimated 80% of electronic health/medical records (EHR/EMR) data is unstructured. This data comes from medical imaging (e.g., X-ray, CT, MRI), free-form text from healthcare providers (e.g., clinical notes, nurse admission notes, doctor's discharge summaries), and is exploding with the rise of digital and telehealth engagement (creating new data such as patient-reported outcomes, transcripts, etc.). This deluge has become an incredible challenge for medical reviewers and coders, creating a critical resource gap that prevents them from promptly processing incoming medical texts.

This challenge is particularly acute with the increasing demands of value-based care (VBC) programs. Written notes often capture rich information about the severity of patients' conditions, interventions, clinical recommendations, and even socioeconomic status. In contrast, structured medical codes from claims or EHR data—often designed for billing and transactional purposes—fail to represent a complete view of a patient's health. While clinical and administrative staff can find more comprehensive information for patient management by reviewing these notes, the process is highly manual and time-consuming.

With the recent development of generative AI, Large Language Models (LLMs) have displayed an impressive ability to perform numerous tasks in a medical context, such as summarizing doctor-patient transcripts, answering medical licensing examinations, and aiding in medical consultations. However, any automation or AI to accelerate this process must include human-in-the-loop verification and validation. It is especially critical as the foundation models perform suboptimally in extracting medical codes from text.

Reviewing and acting on this medical information is critical to ensure:
-   Optimal patient outcomes delivered by healthcare providers.
-   Correct diagnoses, appropriate transitions of care, rehospitalization prevention, and accurate medical history capture for future healthcare needs.
-   Assignment of correct and validated medical codes for providers and payers to streamline accurate reimbursement and coverage.
-   High-quality data that leads to better patient insights, improving health services, and the development of new therapies by life sciences organizations.

# Case Study
## Building a Generative AI-Powered Medical Coding Pipeline with Dataiku's Medical Entity Extraction Assistant Solution
This Solution facilitates medical coding from clinical notes through a semi-automated, human-in-the-loop workflow.

### Project Background
A medical coding team at a U.S. healthcare system reviewed physicians' notes and assigned appropriate billing codes based on medical diagnoses and history. Their process relied on legacy, rule-based software to validate billing codes submitted by physicians for each visit. With the system's transition to a value-based care model, the clinical operations team required a more holistic approach to capture each patient's complete health status and disease burden.

By deploying Dataiku's Medical Entity Extraction Assistant Solution, the medical coding team established a generative AI-powered pipeline that extracts critical clinical information and maps it to the relevant billing codes from each clinical note. Coders now utilize the Solution's integrated web app to efficiently review, modify, and verify the AI-suggested billing codes, ensuring speed and accuracy. This new pipeline successfully captures nuanced clinical insights from unstructured notes, providing a much richer and more accurate understanding of their patient population.

### Initial Situation
**Current workflow and challenges**
-   Relied on a rule-based system to validate billing codes submitted by physicians for each clinical visit.
- Time-consuming manual patient chart reviews are required to identify key clinical events and establish a patient profile.
-   Lacked an efficient communication channel between medical coders and clinicians for query resolution.

**Data**
This Solution requires a clinical note file as the input dataset.

This Solution prepackages several data sources:
-   ICD-10-CM & ICD-10-PCS codes, labels, and definitions generated from the [UMLS API](https://documentation.uts.nlm.nih.gov/).
-   [CCSR for diagnosis and procedures](https://hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp).

For more details about the data model, please read the [Data Model article](article:14).

**Goals:**
-   Accelerate the medical coding review process to reduce coder backlog and manual effort.
-   Improve the accuracy and comprehensiveness of medical coding by extracting key conditions and events directly from unstructured clinical notes.
-   Establish an efficient, human-in-the-loop workflow for coders to review, edit, and validate AI-suggested codes, ensuring quality and compliance.
-   Create a higher-quality, more complete dataset better to represent patient disease burden and support value-based care initiatives.

### Insights
**Install the Solution with a Dataiku Application:**
1.  **Ingest the clinical note file:** Upload the file containing clinical notes and build the pipeline.
2.  **Review and verify the model-assigned billing codes:** Launch the web app to review, modify, and verify the billing codes for each visit.

3.  **Dashboard for model metrics:** A panel for the data team to monitor model performance.

Please review the [Walkthrough](article:8) for setup details.

### Business Impact
By implementing Dataiku's Medical Entity Extraction Assistant Solution, the medical coding team can:

* **Increase Coder Productivity and Efficiency:** Automate the initial extraction of clinical entities and mapping to billing codes, allowing coders to shift their focus from manual data entry to higher-value review and validation tasks. It significantly reduces the time spent per note and helps clear processing backlogs.
* **Enhance Coding Accuracy and Comprehensiveness:** Analyzing the full text of clinical notes captures a more complete and nuanced view of a patient's health status, identifying conditions that rule-based systems or structured data might miss. It leads to more accurate risk adjustment and better data for VBC programs.
* **Improve Reimbursement and Financial Outcomes:** Ensure billing codes accurately reflect the full scope of patient care documented in clinical notes, reducing claim denials and securing appropriate reimbursement for services.
* **Empower staff with an Integrated Workflow:** Provide two user-friendly web applications that allow medical coders to easily review, accept, or modify AI-generated suggestions. This allows coders to maintain full human oversight and control while benefiting from AI-driven acceleration. It creates a direct channel for coders to formulate specific, evidence-based queries for clinicians, streamlining the process of resolving documentation ambiguities.
* **Generate High-Quality Data for Analytics:** Create a rich, structured dataset derived from unstructured notes, which can be used for advanced analytics, patient population health management, and clinical research.

### Conclusion
The implementation of Dataiku's Medical Entity Extraction Assistant Solution successfully transformed the healthcare system's medical coding process, moving it from a slow, manual, rule-based operation to an efficient, AI-augmented workflow. By leveraging generative AI within a secure, human-in-the-loop framework, the organization significantly boosted coder productivity and improved the accuracy of its medical coding. The Solution streamlined their reimbursement cycle and provided a more comprehensive understanding of patient health, a critical asset for succeeding in a value-based care environment. The Solution bridges the gap between unstructured clinical data and actionable insights, delivering tangible value to administrative and clinical teams.