greensmg: Analyzing "expert questions" statistically

Thursday, April 17, 2025

Analyzing "expert questions" statistically

Analyzing "expert questions" statistically depends a lot on the type of data you have and what you're trying to find out. Let's break it down and I’ll give you an overview of possible approaches.

🔍 Step 1: Understand the Question Type

What do you mean by “expert question”? Possibilities:

A question written by an expert (e.g., to test others' knowledge)?
A question posed to an expert (e.g., in interviews)?
A question answered by experts, where you’re evaluating the quality/consistency?
A survey question labeled as “expert-level”?

Once that’s clear, you can move to…

📊 Step 2: Collect the Right Data

Typical data points could include:

Difficulty ratings (from a panel or AI)
Number of correct answers by non-experts
Time taken to answer
Agreement among experts
Textual features (e.g., length, keywords, jargon density)

📐 Step 3: Statistical Methods You Can Use

Here are some methods depending on what you’re analyzing:

A. Descriptive Statistics

To describe the question characteristics:

Mean, median, SD of difficulty rating
Frequency of topic types
Length of question (word count, sentence complexity)

B. Inter-Rater Reliability

If multiple experts rate the same question:

Cohen’s Kappa (for 2 raters)
Fleiss’ Kappa (for 3+ raters)
ICC (Intraclass Correlation Coefficient) if ratings are on a continuous scale

C. Item Analysis (common in education/testing)

Used to evaluate questions in a test:

Difficulty index (p-value): % of people who answered it correctly
Discrimination index: Correlation between the item score and total score
Point-biserial correlation

D. Inferential Statistics

If you want to compare groups (e.g., expert vs non-expert responses):

t-test / ANOVA: compare mean ratings or performance
Chi-square test: compare distributions (e.g., topic frequency)
Regression Analysis: predict difficulty or accuracy based on features of the question

E. Text Analysis + Statistics

If you're analyzing the question text itself:

Use text mining to extract features (TF-IDF, readability scores)
Then apply clustering, factor analysis, or logistic regression to relate these to outcomes (like answer accuracy or expert agreement)

🧠 Example Use Case

You have 50 expert-written questions and 100 learners answering them. You want to know which questions are too easy or too hard.

Calculate p-value (difficulty index) for each question
Use discrimination index to see which questions best separate high- and low-performing learners
Use item-total correlation to flag bad items
Optional: run a factor analysis to see if items group by topic or skill