Codifying Semantic Information in Medical Questions
Using Lexical Sources
Paul E. Pancoast, MD, Arthur B.
Smith, MS, Chi-Ren Shyu, PhD
University of Missouri-Columbia
Methods:
Source
Questions (4083 test questions):
University of Iowa researchers gathered questions from clinicians during observation in a clinical setting. Ely
JW, Osheroff JA, Ebell MH, et al. Analysis of questions asked
by family doctors regarding patient care. BMJ. Aug 7
1999;319(7206):358-361
British researchers answer questions submitted by
clinicians and post
evidence-based answers on the web. http://www.attract.wales.nhs.uk/about/one.htm
Australian researchers answer questions submitted by
clinicians and post
evidence-based answers on the web. http://ww.sph.uq.edu.au/CGP/red/quest/faqs.asp
Source
Vocabulary:
MRCON a
table from the Metathesaurus (2003AB)
Lists the medical concepts by unique identifiers
(CUI)
Lists each string (word or phrase) associated with the concepts
UNIQUE (string => 1 concept)
AMBIGUOUS (string => 2+ concepts)
COLD 1) ambient temperature, 2) viral upper
respiratory infection,
3) Chronic Obstructive Lung Disease
2,247,454 strings associated with concepts
1,860,680 unique strings
900,550 unique CUI
Non-medical Lexicon generated from Rogets Thesaurus
Query objects (why, when, how), identifiers (I, you,
he), modifiers (soon,
frequently), action/relationship (treats, attends, reduce, lessen, can, improve)
749 terms in this lexicon
Parsing
Application:
Source vocabulary => 37-ary tree structure
Indexed by sequences of characters in the string
(a-z0-9 and <space>)
after lower-casing and removal of punctuation
Source questions
Examined in 3-word, 2-word, 1-word windows for
matches with the
source vocabulary
{what is the} best treatment for acute pharyngitis
What {is the best} treatment for acute pharyngitis . .
.
What is the best treatment for {acute pharyngitis}
!!Match . .
What is the best {treatment} for xxxxxxxxx !!Match
Generates a report of:
Total number of words parsed
Number of matches from UNIQUE, AMBIGUOUS, NON-MEDICAL LEXICON
Strings that didnt match the source vocabularies
Results:
(unmatched words, 2+ occurrences)
We suspect that some unmatched words will be
important to
determine the meaning of a medical question
particularly relationship words (verbs)
Discussion:
MRCON selected
for relatively low rate of ambiguous strings (11%) although other tables have a larger number of strings, they have
much higher ambiguous rates.
Other researchers
have similar results matching biomedical text to controlled vocabularies
Cimino et. al matched 43% of words with Meta-1 (we had 56% Metathesaurus matches) Computers and Biomedical
Research. Aug 1992;25(4):366-373.
Hersh et.
al matched 60% of words to a
medical terminology and names dictionary (we had 79% combined lexicon matches) Proceedings/AMIA
Annual Fall Symposium. 1997
Stop words commonly removed by most normalization
tools. (Prepositions, conjunctions, pronouns, etc)
They provide valuable contextual information in
medical questions
Blood FOR an
HIV-positive patient - much
different than
Blood FROM an
HIV-positive patient
Patient taking asprin AND coumadin - more
likely to bleed than
Patient taking asprin OR coumadin
Integers difficult to manage (discarded in
MetaMap) but also provide valuable discriminatory information
Patient with hyperkalemia of 5.1 mEq/li a concern
but not critical
Patient with hyperkalemia of 8.9 mEq/li either a
lab error or dead by now
Verbs Action/Relation Concepts
Not listed in the Metathesaurus
..
Some included in our Non-medical lexicon
Verb strings => concepts very fluid and VERY
ambiguous how many concepts can
be represented by USE?
Relation concepts may be conceptually related to
entity/event concepts, but they are
not equivalent
Diagnose => Diagnosis
Treat => Treatment
Evaluate => Evaluation
Verb tense changes the meaning of a question
in a patient TAKING antibiotics
in a patient who TOOK antibiotics
Research Purpose:
To find a method
for classifying medical questions that are asked by clinicians.
Hypothesis:
Simply indexing
questions by keyword isnt sufficient to
Distinguish
questions with different meanings but similar wording
Group questions
with different words but similar meanings
Examples:
Different words What is the best way to treat acute pharyngitis in healthy
children?
Similar meaning How
do you approach a normal pediatric patient with a sore throat?
Similar words How do you deal with diabetic patients
who are resistant to insulin?
Different meanings How do
you deal with diabetic patients who are resistant to taking insulin?
Why Bother? (to classify medical questions?)
Clinicians often have questions when treating
patients
Researchers have gathered collections of these
questions
There is no good method to classify the questions in
these collections
How many times has a particular question been
asked? And, has a similar question already been definitively answered (using
evidence-based methods?)
Which questions should receive priority when
evidence-based answers are written?
How should a database of questions with
evidence-based answers be indexed for retrieval?
What kind of search capabilities should it have?

100 unique
strings 7850 occurrences 57.6% of
total matches
712 unique
strings 3+ hits 85% of total
matches
19.1% of
words didnt match the source
vocabulary
If
acute pharyngitis is found 12 times
=> 1 string, 12
hits, 24 words
Words the total number
of matching words
Hits how often individual
strings were found
String individual word or phrase that matched (the source vocabulary
Basic Structure of the B Tree
Information Flow Diagram
Acknowledgements:
The
authors gratefully acknowledge Jon Brassey,
TRIP Database & Director, ATTRACT
Wales (UK), the Medical School
of the University of Queensland (Australia),
the Centre for Reviews and Dissemination,
University of York (UK), and
Dr. John Ely for the kind donation of clinical question sets.
Dr. Pancoast acknowledges
the National Library of Medicine
Biomedical and Health Informatics
Research Training Grant 2-T15-LM07089-11.
Summary:
We developed an
application to:
Extract medical
concepts from natural language text
Map these medical
concepts to a controlled vocabulary
This is similar to
the MetaMap application, but with a different purpose (to represent the meaning of medical questions rather than to extract
search terms from medical text)
We used these codified representations to match
questions using vector calculations (G. Salton, A. Wong, and C.S. Yang. A vector space model
for automatic indexing. Communications
of the ACM, 18(11), 1975, pp. 613-620)
The major hindrance to satisfactory matching and
clustering is the lack of representation of relations between medical concepts