Fact-Checking Q&A Chatbot

A fact-checker has to do more than attach a label to a sentence. It needs to find relevant prior claims, distinguish wording from meaning, and show enough context for the verdict to be inspected. FactCheckLIAR treats those as separate, replaceable stages rather than one opaque prediction.

Truth Is Not Binary

The LIAR dataset records short political statements together with speaker, party, context, and truthfulness history. Its six ordered labels preserve the uncertainty that a true-or-false interface would erase.

Manually labelled claims: 12,800
Truthfulness labels: 6
Retrieval channels: BM25 + FAISS
Classifier: BERT

Each claim also carries contextual metadata that can be inspected alongside the retrieved statement.

01Pants on fireNo factual basis

02FalseFactually incorrect

03Barely trueA trace of truth, but misleading

04Half truePartially correct

05Mostly trueMinor inaccuracies remain

06TrueSupported by the record

The LIAR label space is an ordered spectrum. Most of the useful distinctions sit between the two endpoints.

Evidence Before Verdict

The application makes the path from a submitted claim to a readable answer explicit. Every stage produces an artefact that the next stage can use and the interface can expose.

01Read the claimPreserve names, phrases, and the exact wording of the query.Query

02Retrieve evidenceCombine lexical overlap with semantic similarity across LIAR claims.Evidence set

03Predict veracityMap the claim to one of six ordered truthfulness labels.Verdict

04Explain the resultTurn the label and retrieved context into a concise or detailed answer.Response

The claim-to-verdict loop keeps retrieval, classification, and response generation independently inspectable.

Hybrid Retrieval

Political claims are often paraphrased, so neither exact term matching nor semantic similarity is reliable alone. FactCheckLIAR normalizes both signals and fuses them before selecting supporting claims.

BM25Keep exact political language influential.

Tokenized LIAR statements are ranked by term relevance, preserving names, places, institutions, and distinctive phrases.

Best atNames · phrases · exact overlap

Three views of the same retrieval stage: exact language, semantic similarity, and the fused evidence ranking.

Six-Way Classification

The classifier is unshDee/liar_qa, a bert-base-uncased sequence classifier fine-tuned for the six LIAR labels. A local model directory takes priority; the saved model is downloaded only when it is not already available.

ClaimWhat is being asserted?

EvidenceWhich LIAR records are closest?

ContextWho said it, where, and when?

BERT classifierOne of six truthfulness labelswith retrieved context retained for explanation

The classifier produces the verdict; retrieval and metadata provide the evidence surface around it.

Human-Readable Response

A label is useful for evaluation, but a visitor needs an answer. The final stage can produce a reproducible template or ask a local Ollama model to turn the same evidence into more natural prose.

Stable, reproducible wording

Template responses work without an LLM and make repeated runs easy to compare. The --no-llm flag forces this path.

Both response modes receive the same predicted label and retrieved evidence; only the wording layer changes.

Operational Design

Persistent Indexes

Rebuilding sparse and dense indexes on every start interrupted iteration. The current version caches both indexes and uses a dataset hash to invalidate them whenever data/train.tsv changes.

Rebuild indexes30–60 s

Encode claims · build FAISS · build BM25

Load valid cache2–3 s

Verify dataset hash · restore indexes

Persistent indexes turn repeat startup into a cache validation and load rather than a full rebuild.

Failure-Safe Generation

Ollama is an enhancement rather than a runtime dependency. If it is unavailable or deliberately disabled, the application falls back to the deterministic response path instead of withholding a result.

Interfaces

The same pipeline is available as a direct command-line workflow and a Streamlit interface. The terminal exposes evidence for development; the browser version focuses on claim entry, detail level, and response mode.

Terminal Workflow

factcheckliar: local verification

$ python app.py --query "A claim to verify" --verbose --no-llm
Loading cached BM25 and FAISS indexes
Retrieving lexical and semantic evidence
Ready: evidence, six-way label, deterministic explanation
$ streamlit run streamlit_app.py
Local interface available in the browser

A reproducible terminal run loads cached retrieval indexes, gathers evidence, and can launch the browser interface.

Streamlit Demo

The hosted interface keeps the interaction deliberately small. The demo action in the project header opens it in a separate tab, avoiding a slow or sleeping third-party embed inside the article.

Claim inputPaste or type the statement to check

Evidence detailSwitch between concise and verbose views

Generation modeChoose deterministic or local-LLM wording

Resource cacheReuse models and indexes across interactions

The result is not a claim of automated certainty. It is a compact, inspectable workflow for comparing a new statement with labelled political claims and seeing how the predicted verdict was assembled.