Analyzing "expert questions" statistically depends a lot on the type of data you have and what you're trying to find out. Let's break it down and I’ll give you an overview of possible approaches.
🔍 Step 1: Understand the Question Type
What do you mean by “expert question”? Possibilities:
-
A question written by an expert (e.g., to test others' knowledge)?
-
A question posed to an expert (e.g., in interviews)?
-
A question answered by experts, where you’re evaluating the quality/consistency?
-
A survey question labeled as “expert-level”?
Once that’s clear, you can move to…
📊 Step 2: Collect the Right Data
Typical data points could include:
-
Difficulty ratings (from a panel or AI)
-
Number of correct answers by non-experts
-
Time taken to answer
-
Agreement among experts
-
Textual features (e.g., length, keywords, jargon density)
📐 Step 3: Statistical Methods You Can Use
Here are some methods depending on what you’re analyzing:
A. Descriptive Statistics
To describe the question characteristics:
-
Mean, median, SD of difficulty rating
-
Frequency of topic types
-
Length of question (word count, sentence complexity)
B. Inter-Rater Reliability
If multiple experts rate the same question:
-
Cohen’s Kappa (for 2 raters)
-
Fleiss’ Kappa (for 3+ raters)
-
ICC (Intraclass Correlation Coefficient) if ratings are on a continuous scale
C. Item Analysis (common in education/testing)
Used to evaluate questions in a test:
-
Difficulty index (p-value): % of people who answered it correctly
-
Discrimination index: Correlation between the item score and total score
-
Point-biserial correlation
D. Inferential Statistics
If you want to compare groups (e.g., expert vs non-expert responses):
-
t-test / ANOVA: compare mean ratings or performance
-
Chi-square test: compare distributions (e.g., topic frequency)
-
Regression Analysis: predict difficulty or accuracy based on features of the question
E. Text Analysis + Statistics
If you're analyzing the question text itself:
-
Use text mining to extract features (TF-IDF, readability scores)
-
Then apply clustering, factor analysis, or logistic regression to relate these to outcomes (like answer accuracy or expert agreement)
🧠 Example Use Case
You have 50 expert-written questions and 100 learners answering them. You want to know which questions are too easy or too hard.
-
Calculate p-value (difficulty index) for each question
-
Use discrimination index to see which questions best separate high- and low-performing learners
-
Use item-total correlation to flag bad items
-
Optional: run a factor analysis to see if items group by topic or skill

No comments:
Post a Comment