Paper Reading List | xiaojuan’s blog

type

status

date

slug

summary

category

icon

password

Property

Feb 28, 2024 01:28 PM

💬

Paper Reading List

title

Type

Status

abstract

link

Publish Date

Author

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

LLM & KG

Done

引入外部知识的同时不丢失拓扑结果信息。 1. 根据query提取subgraph 2. LLM merge subgraph 3. 以reasoning path形式输给LLM，prompt得到得到答案以及mindmap

Sep 15, 2023

Unifying Structure Reasoning and Language Pre-training for Complex Reasoning Tasks

LLM & KG

Done

为LM引入structure reasoning abilities。将PLM text embedding初始化entity relation，再利用query2Box的方法 stepwise explicitly excute reasoning trace，最终使query similar to answer。

Jul 15, 2023

A Unified Knowledge Graph Augmentation Service for Boosting Domain-specific NLP Tasks

LLM & KG

Done

一种针对domain- specific的数据增强方式来进行fine tune PLM。比如对entity pattern相同的进行替换；expressivity 多样性等（工程上）

Jun 5, 2023

Graph Reasoning for Question Answering with Triplet Retrieval

LLM & KG

Done

为了解决triplets related question is disconnected，利用全局信息的方法。 1. KG 转化成passages corpus 2. question and choice 选出top k related passages 3. concat后借助LLM reasoning得到答案

May 30, 2023

Complex Logical Reasoning over Knowledge Graphs using Large Language Models

LLM & KG

Done

1. 将LLM作为reasoner agent，decouple掉KG里面semantics变成纯符号。减少幻觉影响。 2. deterministic decompose掉complex query使LM在单独解决时较好

May 24, 2023

Symbol tuning improves in-context learning in language models

improving

Done

propose symbol tuning, which involves fine-tuning a language model using input-label mapping unrelated to semantic prior. They aim to investigate whether LLMs can induce input-label patterns when performing unseen in-context learning tasks and further improve reasoning abilities.

Exploring Demonstration Ensembling for In-context Learning

improving

Done

concatenation 1. offers almost no control over the contribution of each demo to the model prediction 2. infeasible to fit many examples into the context. Thus, this paper proposes Demonstration Ensembling method

BERT is not The Count: Learning to Match Mathematical Statements with Proofs

evaluation

Done

symbol replacement affects the mathematical ability of BERT-like models

Larger language models do in-context learning differently

evaluation

Done

investigate the effects of semantic priors and input-label mapping on in-context learning using different-scale models and instruction-tuned models. They focus on the influence of in-context examples on in-context learning performance.

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. （May 7

evaluation

Done

CoT Explanations 不 faithful，通过在demonstration中增加bias features（总是为A答案）

Calibrate and Boost Logical Expressiveness of GNN Over Multi-Relational and Temporal Graphs

Done

GNN在FOC_2 node classifier上的拟合/表达能力。某个node是true or false依靠某个logic formula来验证。希望GNN能模拟这样一个logic classifier

A Logic for Expressing Log-Precision Transformers

In progress

LLMS describe logical rules and its inference is equivalent to resolution over those logical rules.

Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction

induction

Done

RL的可解释性问题：为啥采取这样的action。基于此，将策略pi改写成了基于logic rule的differentiable形式。本来是黑箱决策，现在添加中间logic rules的显式调用。学习rule的方式和RNNLogic很像，都是现根据initial rules propose再学习weight，选择top-k rules

Neural-Logic Human-Object Interaction Detection

In progress

HOI任务：判断human和object的interaction或者relation；由于该任务包含某些reasoning的需求，因此1. 修改transformer架构使其能在triplets层面进行attention，在架构上让transformer具备symbolic reasoning能力；2. 加logic rules对其进行正则化限制（采用fuzzy logic）【TODO】

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

Done

visual reasoning QA。需要一定reasoning abilities或者multi-hop query；across different domains 将language query以logic tree形式给出，中间node 独立于具体的domain，leaf node grounding到具体的模态/domain上。和AOG思想很类似。 leverages LLMs to convert the questions to a logic-based domain-independent representation that is subsequently grounded

Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions

In progress

从human action中induce logic rules，类似RNNLogic，采用EM算法进行优化：rule generator and reasoning predictor。

Reasoning Ability Emerges in Large Language Models as Aggregation of Reasoning Paths: A Case Study With Knowledge Graphs

evaluation

Done

在KG数据上train，采用不同的方式train，发现pre training才是获得reasoning能力的关键，而不是instruct-tuning; 另外证实reasoning其实是在aggregate reasoning path in pre-training

LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering

based on formal description logic, create argument VQA with complex logic reasoning

Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic

improving

Done

打造了一个corpus包含了一节逻辑deduction完备的公理系统进行fine tune，发现有一定的reasoning improvement。并且强调了formula的复杂和干扰事实的增加能有助于提升；但depth和transfer到realistic setting依然难以解决。

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

improving

Done

motivated by 很多句子都logic consistent而不仅仅是origin ground truth text。鉴于此，引入verifier模块对logic 一致性进行打分，鼓励generator生成更多logic consistency的句子

Logical Transformers: Infusing Logical Structures into Pre-Trained Language Models

improving

Done

引入logic embedding，通过parse sentence为logic structure然后学习到的embedding输入给input embedding

When can transformers reason with abstract symbols?

evaluation

In progress

理论上来说明，经过大量的ft，transformer可以完成template matching（模版一样，只是variables不一样）即所说的一阶泛化。但另外是对于symbolic task，它不能。 For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an “inverse scaling law”: transformers fail to generalize as their embedding dimension increases.

Constraint Reasoning Embedded Structured Prediction

提出decision diagram来连接ML和symbolic solver（constraint reasoning）的桥梁。和circuit很像。

Neural Algorithmic Reasoning with Causal Regularisation

利用其中的causality不变性，扩充数据，对比学习来增强reasoning能力。

LogicDP: Creating Labels for Graph Data via Inductive Logic Programming

In progress

一种借助ILP来生成更多graph-reasoning数据的方式。mine rule的方式之前就有，label需要aggregate。为啥要多这么一步增加数据的过程（匪夷所思）

Faith and Fate: Limits of Transformers on Compositionality

evaluation

Done

empirically的研究LLMs在compositionality上的局限性。从计算图的角度提供composition的度量；相对熵和frequency来探究failures、success的原因；以及从理论上证明error propagations

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

improving

Done

prompt LM生成code，一部分让api执行，执行不了的query LM。这样保证部分semantic problem可以用伪代码来生成。（弥补了rich semantic 在symbolic的问题）

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

evaluation

Not started

This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. GPT-4和ChatGPT在trandition benchmark上表现还可以，但是在OOD上表现很差。

Unveiling Transformers with LEGO: a synthetic reasoning task

evaluation

用一个可控的数据集LEGO来探究transformer在模型训练是如何工作的，说明pre-training即便不相关的任务也很重要，以及chain of reasoning 可能会学习到某些shortcut

GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

evaluation

Done

examine the performance of GPT-3.5 and GPT-4 models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets such as deductive, inductive, abductive, analogical, causal, and multi-hop reasoning, through question-answering tasks.

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

evaluation

Done

evaluate fifteen logical reasoning datasets from fine-level metric (answer correctness, explain correctness, explain completeness and explain redundancy). Meanwhile, they propose a neutral content

Reasoning or Reciting? Exploring the Capabilities and Limitations ofLanguage Models Through Counterfactual Tasks

evaluation

Done

measure task-level generalizability by taking tasks on which LMs perform well, and altering the conditions or rules under which these tasks are performed

Large Language Models are Better Reasoners with Self-Verification

improving

The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python

evaluation

GPT-3 and related models are fragile under identifier swaps in programs, suggesting that these models may only possess a shallow understanding of code.

Towards understanding chain-of-thought prompting: An empirical study of what matters.

evaluation

Done

从CoT demonstrations来分析，providing rationales with completely invalid reasoning steps或者relevant to the query and correctly ordering the reasoning steps。结论是：model shift step-by-step space when providing demonstrations

Text and patterns: For effective chain of thought, it takes two to tango.

evaluation

a deeper understanding of COT-based few-shot prompting mechanisms in large language models （修改thought），few shot prompting什么在起作用

Limitations of Language Models in Arithmetic and Symbolic Induction

evaluation

identify a set of simple symbolic manipulation tasks and uncover the limitations of the LMs in arithmetic and symbolic induction

Joint Language Semantic and Structure Embedding for Knowledge Graph Completion

LLM & KG

Towards Foundation Models for Knowledge Graph Reasoning

LLM & KG

Done

learning universal and transferable graph representations by leveraging meta-rule structure information (invariances across different graphs) 有点类似RulE，将同样rule结构初始化一个embedding来进行。

SKILL - MIX : A FLEXIBLE AND EXPANDABLE FAMILY OF EVALUATIONS FOR AI MODELS

evaluation

Done

一种以skill-mix方式且randomly sampled来作为一种新型evaluation的方式，防止一定程度上的填鸭式学习，且发现有些模型仅仅在单任务上表现好，然而缺乏general-purpose的能力