Paper readding
Interested in:
(1)Query understanding、Retrieval、Relevance、Rank、Multi-modal
(2)Large Language Model
Keywords:Information Retrieval、Lexicon-aware retrieval、Dense retrieval、dual-encoder、sparse representations for queries and documents

https://research.google/pubs/
https://research.facebook.com/publications/
https://www.microsoft.com/en-us/research/publications/?
https://openai.com/research
http://research.baidu.com/Publications
Arxiv: https://arxiv.org/search/cs
Google-scholar: https://scholar.google.com/

TODO
[] LLaMA: Open and Efficient Foundation Language Models
(1)n-gram 语言模型判断低质文本
(2)SentencePiece分词尝试https://juejin.cn/post/7234795667477561402
[] 如此简单!LLaMA-2 finetune 实战!🚀🚀🚀
[] https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/chinese_bloom
[] 多模态相关:图文多模态模型类型梳理
[] COS 597G: Understanding Large Language Models
[] https://github.com/microsoft/SimXNS
[]

LLM
[] Deduplicating Training Data Makes Language Models Better
[x] LLaMA: Open and Efficient Foundation Language Models
[] Llama 2: Open Foundation and Fine-Tuned Chat Models
[] OPT
[] Plan
[] BlooM
[] Falcon
[] Clinchilla
[] BARD
[] Claude
[] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
[] Retentive Network: A Successor to Transformer for Large Language Models

PLMs
[x] A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products
[] Pre-Training with Whole Word Masking for Chinese BERT
[x] BioBERT
[x] SciBERT
[] PubMedBERT
[] BioMedBERT
[x] FinBERT

retrieval/relevance
[x] A Dual Augmented Two-tower Model for Online Large-scale Recommendation
[x] duoBERT
[x] colBERT
[] ColBERTv2
[] Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
[] Supervised Contrastive Learning Approach for Contextual Ranking
[x] RankCSE : Unsupervised Sentence Representation Learning via Learning to Rank
[] Contrastive Fine-tuning Improves Robustness for Neural Rankers
[] Ranking-Enhanced Unsupervised Sentence Representation Learning
[] ConSERT
[x] End-to-End Query Term Weighting
[] VIRT: Improving Representation-based Text Matching via Virtual Interaction
[x] Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval
[x] LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval
[] Sparse, Dense, and Attentional Representations for Text Retrieval
[] Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval
[] Text Embeddings by Weakly-Supervised Contrastive Pre-training (E5)
[] Large Dual Encoders Are Generalizable Retrievers (GTR)
[x] RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder (RetroMAE)
[] Dense Passage Retrieval for Open-Domain Question Answering (DPR)
[] One Embedder, Any Task: Instruction-Finetuned Text Embeddings (Instructor)
[] Condenser: a Pre-training Architecture for Dense Retrieval
[] Less is more: Pretrain a strong Siamese encoder for dense text retrieval using a weak decoder
[] Distilling knowledge to twin-structured bert models for efficient retrieval
[] Ernie-search: Bridging cross-encoder with dualencoder via self on-the-fly distillation for dense passage retrieval
[] Pretraining tasks for embedding-based large-scale retrieval
[] Improving bi-encoder document ranking models with two rankers and multiteacher distillation
[] Unsupervised corpus aware language model pre-training for dense passage retrieval
[x] Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook
[x] Embedding-based Retrieval in Facebook Search
[] Mobius: Towards the next generation of query-ad matching in Baidu's sponsored search
[] Pre-trained Language Model based Ranking in Baidu Search
[] Simrank++: Query Rewriting through Link Analysis of the Click Graph
[] Embedding-based Product Retrieval in Taobao Search
[] Improving Deep Learning For Airbnb Search
[] Optimizing Airbnb Search Journey with Multi-task Learning
[] Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising
[] Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
[x] More Robust Dense Retrieval with Contrastive Dual Learning
[] Pre-training via Paraphrasing
[] Constructing Tree-based Index for Efficient and Effective Dense Retrieval
[] Dense Text Retrieval based on Pretrained Language Models: A Survey
[x] SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval
[x] Adversarial retriever-ranker for dense text retrieval
[] Debiased contrastive learning of unsupervised sentence representations
[] RocketQA: An optimized training approach to dense passage retrieval for opendomain question answering
[] Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking.
[] PROD: Progressive Distillation for Dense Retrieval
[] Resources and Evaluations for Multi-Distribution Dense Information Retrieval
[x] SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
[] SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
[] ANCE
[] SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval
[] From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
[] Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation
[] IRGAN
[] Transformer Memory as a Differentiable Search Index

Rank
[]
Multi-modal
[] BEIT: BERT Pre-Training of Image Transformers

Other
[] Supervised Contrastive Learning
[] Exploring the Landscape of Natural Language Processing Research

Refs:
https://github.com/tangxyw/RecSysPapers/tree/main/Match
https://github.com/caiyinqiong/Semantic-Retrieval-Models
https://github.com/UKPLab/sentence-transformers/tree/master
https://github.com/naver/splade
https://github.com/gabriben/awesome-generative-information-retrieval
https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/losses/MarginMSELoss.py