自然语言处理
大语言模型LLM
★★★★★
wikipedia相关知识内容, 由给定prompt问题和五个选项,预测正确答案
关键在于RAG优化, 主流包括采用LLM做或者deberta分类来做
标题和第一句话来召回文章articles,再从文章中召回相关句子
或直接召回整个段落, (separated by '\n')
6th code discussion
12nd discussion-zhihu
kaggle-LLM-Detect AI Generated Text
★★★★☆
1st code
LMSYS - Chatbot Arena Human Preference Predictions
1st code
2nd code
3rd discussion
WSDM 2024 Conversational Multi-Doc QA
1st code
KDD-Meta_RAG
KDD-Amazon
kaggle-AI Mathematical Olympiad
1st infer code
2nd code discussion
3rd infer code discussion
★★★★★
包含pdf解析、SQL生成、意图识别、RAG召回、prompt问答
★★★☆☆
PDF解析, RAG, LLM prompt问答
4th code
6th discussion
13rd code
ATEC2023-科技精英赛—大模型的知识引入
7th code
2nd code
CCF AIOps 2024
1st discussion
2nd code
text classification 文本分类/回归
Feedback Prize - English Language Learning
1st code discussion
3rd code discussion
13rd code
kaggle-CommonLit Readability Prize
1st code discussion
2nd code discussion
3rd code discussion
4th code discussion
5th discussion
6th discussion
9th discussion
12th discussion
13th discussion
14th discussion
15th discussion
16th discussion
18th discussion
20th code discussion
22nd discussion
24th discussion
25th. discussion
28th. discussion
37th. discussion
40th. discussion
47th. discussion
48th. discussion
69th. discussion
87th. discussion
100th. discussion
kaggle2022-Jigsaw Rate Severity of Toxic Comments
1st code
14th code discussion
kaggle2020-Jigsaw Multilingual Toxic Comment Classification
3rd code discussion
4th discussion
6th discussion
10th discussion
kaggle2019-Jigsaw Unintended Bias in Toxicity Classification
kaggle2018-Toxic Comment Classification Challenge
kaggle-Google QUEST Q&A Labeling
1st code discussion
named entity recognition 实体命名识别
★★★★★
1st code discussion
2nd code discussion
3rd code discussion
4th code discussion
5th code discussion
9th code discussion
11st code discussion
baseline
https://github.com/abhishekkrthakur/long-text-token-classification
https://www.kaggle.com/code/cdeotte/pytorch-bigbird-ner-cv-0-615
https://www.kaggle.com/code/cdeotte/tensorflow-longformer-ner-cv-0-633
人工智能技术创新大赛——商品标题实体识别
达观杯
CCF金融信息负面及主体判定
https://github.com/xiong666/ccf_financial_negative
https://github.com/Chevalier1024/CCF-BDCI-ABSA
https://github.com/rebornZH/2019-CCF-BDCI-NLP
text match 文本匹配
标签传播/Siamese RNN
kaggle-U.S. Patent Phrase to Phrase Matching
1st code discussion
2nd discussion
3rd discussion
5th discussion
7th discussion
8th discussion
10th discussion
12th discussion
24th discussion
31st code discussion
41st discussion
kaggle-TensorFlow 2.0 Question Answering
7st code
collection dicsussion
1st code discussion
24th code
unk code
1st code
4th code
6th code discussion
8th code
1st code discussion
4th code
128th code
http://www.wuyuanhao.com/2019/02/25/quora-insincere-questions-classification%e6%80%bb%e7%bb%93/
1st code
2nd code
https://zhuanlan.zhihu.com/p/533808475
新冠疫情相似句对判定大赛
https://github.com/zzy99/epidemic-sentence-pair
https://github.com/daniellibin/nCoV-2019-sentence-similarity
https://mp.weixin.qq.com/s/B267GHm16ZIlKkxhOJmMqg
https://github.com/Rowchen/pytorch-for-Text-Matching
信息检索
kaggle-Learning Equality - Curriculum Recommendations
★★★★★
多语言content与topic的文本匹配问题
召回:tfidf + transformer arcface + rule
排序:LGB
1st code discussion
2nd code discussion
3rd code discussion
4th code discussion
5th discussion
6th discussion
7th discussion
9th discussion
10th discussion
12nd discussion
15th discussion
26th discussion
EEDI
1st [code](https://github.com/rbiswasfc/eedi-mining-misconceptions
10th [code](https://github.com/shyoulala/Kaggle_Eedi_2024_sayoulala
★★★☆☆
电商场景下,中文匹配. 开源较少
2nd code discussion
13rd code
★★★★☆
★★★★★
notebook根据code cell 顺序markdown cell顺序
1st code
https://github.com/louis-she/ai4code
https://www.kaggle.com/code/nickuzmenkov/ai4code-tensorflow-distilbert-baseline
WSDM2023 Pre-training for Web Search
1st code
1st code
多fold来分别做这件事。多步迭代选择困难负样本,最终选择Top20. 初始向量模型召回1000/200,200作为困难负样本微调向量模型召回100,召回100的部分作为排序负样本,最终选择20
3rd code
7th code
8th code
9th code
MRC 信息抽取
kaggle-Tweet Sentiment Extraction
1st code discussion
2nd discussion
3rd code
7th code
12nd code discussion
**CCKS&百度 2019中文短文本的实体链指))
2019之江杯人工智能大赛电商评论观点挖掘赛道
https://github.com/eguilg/OpinioNet
https://github.com/srtianxia/opinion_mining
科大讯飞2020事件抽取挑战赛
https://github.com/WuHuRestaurant/xf_event_extraction2020Top1
https://github.com/xiaoqian19940510/Event-Extraction
https://lonepatient.top/2022/07/12/gaiic_2022_ner_top10.html
https://github.com/luhua-rain/MRC_Competition_Dureader
https://github.com/YingZiqiang/LES-MMRC-Summary
https://github.com/basketballandlearn/MRC_Competition_Dureader
QA问答
1st code
4th code
5th discussion
36th discussion
https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.3.2db024ddZShYhb&postId=10854
https://github.com/Dikea/Dialog-System-with-Task-Retrieval-and-Seq2seq https://github.com/xueyouluo/S2S-in-Production
其他
2nd discussion
Last updated