Note Category

Agent Planning with World Knowledge Model

Basic Information

2024/05/13 發布 (尚未正式於 Conf. 發表)
Shuofei Qiao, Runnan Fang, Ningyu Zhang et al. @ Zhejiang University, National University of Singapore, Alibaba Group

問題描述

近年來大型語言模型(LLM)在許多自然語言處理的問題有很快速的成長，而近期開始出現一些使用 LLM 作為 agent model 來處理物理環境中的規劃問題。然而由於當前 SOTA 的 LLM 幾乎都是 autoregressive model，模型實際上會做的事情是去預測下一個 output token 要是什麼，實際上他們對於物理環境是沒有任何理解的。

...About 20 min

Attention is all you need

Basic Information

NIPS 2017 (former NeuralPS)
Ashish Vaswani, Noam Shazeer, Niki Parmar et al. from Google Brain and Google Research

問題描述

RNN

近年來自然語言處理(Natural Language Processing, NLP)與機器翻譯等任務上時常使用 Recurrent Neural Network(RNN), Long Short-Term Memory(LSTM), Gated Recurrent Neural Network 等模型架構，我們也看到使用 Recurrent 模型以及 Encoder-Decoder 架構蔚為流行。

...About 16 min

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

Basic Information

Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua
2022 ACM Multimedia

問題描述

這一篇與過去看過的 DACS, ProDA, DAFormer, HRDA 同樣都是以 Unsupervised 的方式解決 Semantic Segmentationb 的 Domain Adaptation問題。

...About 7 min

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Basic Information

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov @ Toronto University
2014 JMLR

問題描述

在近年來發現到 Neural Network 參數越多就有越強大的表達能力，並且通常會有更好的表現。不過隨著參數量的上升，我們也發現到模型越來越會傾向於 Overfitting。

...About 10 min

HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

Basic Information

Lukas Hoyer, Dengxin Dai, Luc Van Gool @ ETH Zurich & MPI for Informatics
2022 ECCV

問題描述

這篇 paper 如同 DAFormer 關注在 UDA for semantic segmentation 。

...About 14 min

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Basic Information

Lukas Hoyer, Dengxin Dai, Luc Van Gool @ ETH Zurich & MPI for Informatics
2022 CVPR

Image from Lukas Hoyer, Dengxin Dai, Luc Van Gool (2022)

...About 12 min

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

Basic Information

Pan Zhang1, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, Fang Wen @ University of Science and Technology of China, Microsoft Research Asia
2021 CVPR

...About 10 min

Agent57: Outperforming the Atari Human Benchmark

Basic Information

Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, et al. @ Google DeepMind
2020 ICML

問題描述

在 RL 當中，Atari games 是一個相當重要的 benchmark。過去的 RL 模型已經能夠在大多的 atari games 當中獲得相當不錯的 performance，例如 MuZero、R2D2，分別在 57 個遊戲當中有 51 和 52 個遊戲是 outperform 人類的。不過可惜的是，在剩下的遊戲當中這些 SoTA 就通常完全沒辦法學習。

...About 20 min

Playing Atari with Deep Reinforcement Learning

Basic Information

2013 NeurIPS
Volodymyr Mnih, Koray Kavukcuoglu David Silver et al.
這個論文提出的做法稱為 DQN(Deep Q-Networks)

問題描述

過去在 RL 領域當中把一些 high-dimensional 的感官資料（如：視覺影像、語音資料等）作為 agent 的輸入去學習一直是一個很大的挑戰。然而我們也看到近幾年 Deep Learning 已經能夠在這種資料上去擷取特徵，進而去完成許多複雜的任務。

...About 6 min

Noisy Networks for Exploration

Basic Information

2018 ICLR
Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, et al. @ Google Deepmind

問題描述

在過去的 RL 當中我們往往仰賴對 agent 的 policy 增加 randomness 去增加 exploration，例如 ϵ-greedy 和 entropy regularization 等。不過這樣的做法往往只能在較於簡單的環境當中有比較有效率的探索，然而在現實狀況下往往並不會如此簡單，而這種探索的困難度甚至是指數性地成長。

...About 13 min