type
status
date
slug
summary
tags
category
icon
password
Property
Feb 28, 2024 01:28 PM
Paper Reading List
title
Type
Status
abstract
link
Publish Date
Author
LLM & KG
Done
引入外部知识的同时不丢失拓扑结果信息。
1. 根据query提取subgraph
2. LLM merge subgraph
3. 以reasoning path形式输给LLM,prompt得到得到答案以及mindmap
Sep 15, 2023
LLM & KG
Done
为LM引入structure reasoning abilities。将PLM text embedding初始化entity relation,再利用query2Box的方法 stepwise explicitly excute reasoning trace,最终使query similar to answer。
Jul 15, 2023
LLM & KG
Done
一种针对domain- specific的数据增强方式来进行fine tune PLM。
比如对entity pattern相同的进行替换;expressivity 多样性等(工程上)
Jun 5, 2023
LLM & KG
Done
为了解决triplets related question is disconnected,利用全局信息的方法。
1. KG 转化成passages corpus
2. question and choice 选出top k related passages
3. concat后借助LLM reasoning得到答案
May 30, 2023
LLM & KG
Done
1. 将LLM作为reasoner agent,decouple掉KG里面semantics变成纯符号。减少幻觉影响。
2. deterministic decompose掉complex query使LM在单独解决时较好
May 24, 2023
improving
Done
propose symbol tuning, which involves fine-tuning a language model using input-label mapping unrelated to semantic prior. They aim to investigate whether LLMs can induce input-label patterns when performing unseen in-context learning tasks and further improve reasoning abilities.
improving
Done
concatenation 1. offers almost no control over the contribution of each demo to the model prediction 2. infeasible to fit many examples into the context. Thus, this paper proposes Demonstration Ensembling method
evaluation
Done
symbol replacement affects the mathematical ability of BERT-like models
evaluation
Done
investigate the effects of semantic priors and input-label mapping on in-context learning using different-scale models and instruction-tuned models. They focus on the influence of in-context examples on in-context learning performance.
evaluation
Done
CoT Explanations 不 faithful,通过在demonstration中增加bias features(总是为A答案)
Done
GNN在FOC_2 node classifier上的拟合/表达能力。
某个node是true or false依靠某个logic formula来验证。希望GNN能模拟这样一个logic classifier
In progress
LLMS describe logical rules and its inference is equivalent to resolution over those logical rules.
induction
Done
RL的可解释性问题:为啥采取这样的action。
基于此,将策略pi改写成了基于logic rule的differentiable形式。本来是黑箱决策,现在添加中间logic rules的显式调用。学习rule的方式和RNNLogic很像,都是现根据initial rules propose再学习weight,选择top-k rules
In progress
HOI任务:判断human和object的interaction或者relation;由于该任务包含某些reasoning的需求,因此1. 修改transformer架构使其能在triplets层面进行attention,在架构上让transformer具备symbolic reasoning能力;2. 加logic rules对其进行正则化限制(采用fuzzy logic)【TODO】
Done
visual reasoning QA。需要一定reasoning abilities或者multi-hop query;across different domains
将language query以logic tree形式给出,中间node 独立于具体的domain,leaf node grounding到具体的模态/domain上。和AOG思想很类似。
leverages LLMs to convert the questions to a logic-based domain-independent representation that is subsequently grounded
In progress
从human action中induce logic rules,类似RNNLogic,采用EM算法进行优化:rule generator and reasoning predictor。
evaluation
Done
在KG数据上train,采用不同的方式train,发现pre training才是获得reasoning能力的关键,而不是instruct-tuning;
另外证实reasoning其实是在aggregate reasoning path in pre-training
based on formal description logic, create argument VQA with complex logic reasoning
improving
Done
打造了一个corpus包含了一节逻辑deduction完备的公理系统进行fine tune,发现有一定的reasoning improvement。并且强调了formula的复杂和干扰事实的增加能有助于提升;但depth和transfer到realistic setting依然难以解决。
improving
Done
motivated by 很多句子都logic consistent而不仅仅是origin ground truth text。鉴于此,引入verifier模块对logic 一致性进行打分,鼓励generator生成更多logic consistency的句子
improving
Done
引入logic embedding,通过parse sentence为logic structure然后学习到的embedding输入给input embedding
evaluation
In progress
理论上来说明,经过大量的ft,transformer可以完成template matching(模版一样,只是variables不一样)即所说的一阶泛化。但另外是对于symbolic task,它不能。
For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an “inverse scaling law”: transformers fail to generalize as their embedding dimension increases.
提出decision diagram来连接ML和symbolic solver(constraint reasoning)的桥梁。和circuit很像。
In progress
一种借助ILP来生成更多graph-reasoning数据的方式。mine rule的方式之前就有,label需要aggregate。
为啥要多这么一步增加数据的过程(匪夷所思)
evaluation
Done
empirically的研究LLMs在compositionality上的局限性。
从计算图的角度提供composition的度量;相对熵和frequency来探究failures、success的原因;以及从理论上证明error propagations
improving
Done
prompt LM生成code,一部分让api执行,执行不了的query LM。这样保证部分semantic problem可以用伪代码来生成。(弥补了rich semantic 在symbolic的问题)
evaluation
Not started
This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. GPT-4和ChatGPT在trandition benchmark上表现还可以,但是在OOD上表现很差。
evaluation
用一个可控的数据集LEGO来探究transformer在模型训练是如何工作的,说明pre-training即便不相关的任务也很重要,以及chain of reasoning 可能会学习到某些shortcut
evaluation
Done
examine the performance of GPT-3.5 and GPT-4 models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets such as deductive, inductive, abductive, analogical, causal, and multi-hop reasoning, through question-answering tasks.
evaluation
Done
evaluate fifteen logical reasoning datasets from fine-level metric (answer correctness, explain correctness, explain completeness and explain redundancy). Meanwhile, they propose a neutral content
evaluation
Done
measure task-level generalizability by taking tasks on which LMs perform well,
and altering the conditions or rules under which
these tasks are performed
evaluation
GPT-3 and related models are fragile under identifier swaps in programs, suggesting that these models may only possess a shallow understanding of code.
evaluation
Done
从CoT demonstrations来分析,providing rationales with completely invalid reasoning steps或者relevant to the query and correctly ordering the
reasoning steps。结论是:model shift step-by-step space when providing demonstrations
evaluation
a deeper understanding of COT-based few-shot prompting mechanisms in large language models (修改thought),few shot prompting什么在起作用
evaluation
identify a set of simple symbolic manipulation tasks and uncover the limitations of the LMs in arithmetic and symbolic induction
LLM & KG
Done
learning universal and transferable graph representations by leveraging meta-rule structure information (invariances across different graphs)
有点类似RulE,将同样rule结构初始化一个embedding来进行。
evaluation
Done
一种以skill-mix方式且randomly sampled来作为一种新型evaluation的方式,防止一定程度上的填鸭式学习,且发现有些模型仅仅在单任务上表现好,然而缺乏general-purpose的能力