自然语言处理

大语言模型LLM

kaggle-LLM Science Examarrow-up-right

★★★★★

  • wikipedia相关知识内容, 由给定prompt问题和五个选项,预测正确答案

  • 关键在于RAG优化, 主流包括采用LLM做或者deberta分类来做

    • 标题和第一句话来召回文章articles,再从文章中召回相关句子

    • 或直接召回整个段落, (separated by '\n')

kaggle-LLM-Detect AI Generated Textarrow-up-right

★★★★☆

LMSYS - Chatbot Arena Human Preference Predictionsarrow-up-right

★★★★★ 输入格式: prompt+res_a+res_b,或者 prompt+res_a+prompt+res_b

WSDM 2024 Conversational Multi-Doc QA

KDD-Meta_RAG

多个领域的问答:"finance", "music", "movie", "sports", "open"reportsarrow-up-right

KDD-Amazon

kaggle-AI Mathematical Olympiadarrow-up-right

天池-SMP 2023 ChatGLM金融大模型挑战赛arrow-up-right

★★★★★

  • 包含pdf解析、SQL生成、意图识别、RAG召回、prompt问答

2023全球智能汽车AI挑战赛arrow-up-right

★★★☆☆

  • PDF解析, RAG, LLM prompt问答

WSDM-kaggle

ATEC2023-科技精英赛—大模型的知识引入

大语言模型微调数据竞赛

CCF AIOps 2024

Kaggle-ARC

text classification 文本分类/回归

Feedback Prize - English Language Learningarrow-up-right

kaggle-CommonLit Readability Prizearrow-up-right

kaggle2022-Jigsaw Rate Severity of Toxic Commentsarrow-up-right

kaggle2020-Jigsaw Multilingual Toxic Comment Classification

kaggle2019-Jigsaw Unintended Bias in Toxicity Classificationarrow-up-right

kaggle2018-Toxic Comment Classification Challengearrow-up-right

CCF-BDCI 2019 互联网新闻情感分析arrow-up-right

kaggle-Google QUEST Q&A Labelingarrow-up-right


named entity recognition 实体命名识别

kaggle-NBMEarrow-up-right

kaggle-feedback

★★★★★

人工智能技术创新大赛——商品标题实体识别

CCKS2019-中文短文本的实体链指

天池中药说明书实体识别挑战arrow-up-right

达观杯

CCF金融信息负面及主体判定

  • https://github.com/xiong666/ccf_financial_negative

  • https://github.com/Chevalier1024/CCF-BDCI-ABSA

  • https://github.com/rebornZH/2019-CCF-BDCI-NLP


text match 文本匹配

标签传播/Siamese RNN

kaggle-U.S. Patent Phrase to Phrase Matchingarrow-up-right

kaggle-TensorFlow 2.0 Question Answeringarrow-up-right

CCF-房产行业聊天问答匹配arrow-up-right

天池-新冠疫情相似句对判定大赛arrow-up-right

Kaggle: Quora Question Pairs

CCF-技术需求与技术成果项目之间关联度

天池-小布助手对话短文本语义匹配

sohu-文本匹配

新冠疫情相似句对判定大赛

  • https://github.com/zzy99/epidemic-sentence-pair

  • https://github.com/daniellibin/nCoV-2019-sentence-similarity

  • https://mp.weixin.qq.com/s/B267GHm16ZIlKkxhOJmMqg

  • https://github.com/Rowchen/pytorch-for-Text-Matching


信息检索

kaggle-Learning Equality - Curriculum Recommendationsarrow-up-right

★★★★★

  • 多语言content与topic的文本匹配问题

  • 召回:tfidf + transformer arcface + rule

  • 排序:LGB

EEDI

  • 1st [code](https://github.com/rbiswasfc/eedi-mining-misconceptions

  • 10th [code](https://github.com/shyoulala/Kaggle_Eedi_2024_sayoulala

天池-问天引擎电商搜索算法赛arrow-up-right

★★★☆☆

  • 电商场景下,中文匹配. 开源较少

wsdm2020arrow-up-right

★★★★☆

kaggle-AI4codearrow-up-right

★★★★★

  • notebook根据code cell 顺序markdown cell顺序

WSDM2023 Pre-training for Web Searcharrow-up-right

**KDD Cup 2024-AQAarrow-up-right


MRC 信息抽取

kaggle-Tweet Sentiment Extractionarrow-up-right

天池-瑞金医院MMC人工智能辅助构建知识图谱arrow-up-right

百度aistudio事件抽取比赛

**CCKS&百度 2019中文短文本的实体链指))

2019之江杯人工智能大赛电商评论观点挖掘赛道

  • https://github.com/eguilg/OpinioNet

  • https://github.com/srtianxia/opinion_mining

科大讯飞2020事件抽取挑战赛

  • https://github.com/WuHuRestaurant/xf_event_extraction2020Top1

  • https://github.com/xiaoqian19940510/Event-Extraction

  • https://lonepatient.top/2022/07/12/gaiic_2022_ner_top10.html

  • https://github.com/luhua-rain/MRC_Competition_Dureader

  • https://github.com/YingZiqiang/LES-MMRC-Summary

  • https://github.com/basketballandlearn/MRC_Competition_Dureader


QA问答

CCKS2019 CKBQA

hindearrow-up-right

中医文献问题生成挑战arrow-up-right

  • https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.3.2db024ddZShYhb&postId=10854

https://github.com/Dikea/Dialog-System-with-Task-Retrieval-and-Seq2seq https://github.com/xueyouluo/S2S-in-Production


其他

kaggle 手语识别

Last updated