自然语言处理

大语言模型LLM

kaggle-LLM Science Exam

★★★★★

  • wikipedia相关知识内容, 由给定prompt问题和五个选项,预测正确答案

  • 关键在于RAG优化, 主流包括采用LLM做或者deberta分类来做

    • 标题和第一句话来召回文章articles,再从文章中召回相关句子

    • 或直接召回整个段落, (separated by '\n')

kaggle-LLM-Detect AI Generated Text

★★★★☆

LMSYS - Chatbot Arena Human Preference Predictions

★★★★★ 输入格式: prompt+res_a+res_b,或者 prompt+res_a+prompt+res_b

WSDM 2024 Conversational Multi-Doc QA

KDD-Meta_RAG

多个领域的问答:"finance", "music", "movie", "sports", "open"reports

  • code

    • trained Sequence Classifiers as routers for Domain and Dynamism based on BGE-M3

  • report

    • 三阶段: Question Answering With LLM Parameterized Knowledge (2) Question Answering With External Sources (3) Final Answer Selection.

KDD-Amazon

kaggle-AI Mathematical Olympiad

天池-SMP 2023 ChatGLM金融大模型挑战赛

★★★★★

  • 包含pdf解析、SQL生成、意图识别、RAG召回、prompt问答

2023全球智能汽车AI挑战赛

★★★☆☆

  • PDF解析, RAG, LLM prompt问答

WSDM-kaggle

ATEC2023-科技精英赛—大模型的知识引入

大语言模型微调数据竞赛

CCF AIOps 2024

Kaggle-ARC

text classification 文本分类/回归

Feedback Prize - English Language Learning

kaggle-CommonLit Readability Prize

kaggle2022-Jigsaw Rate Severity of Toxic Comments

kaggle2020-Jigsaw Multilingual Toxic Comment Classification

kaggle2019-Jigsaw Unintended Bias in Toxicity Classification

kaggle2018-Toxic Comment Classification Challenge

CCF-BDCI 2019 互联网新闻情感分析

kaggle-Google QUEST Q&A Labeling


named entity recognition 实体命名识别

kaggle-NBME

kaggle-feedback

★★★★★

人工智能技术创新大赛——商品标题实体识别

CCKS2019-中文短文本的实体链指

天池中药说明书实体识别挑战

达观杯

CCF金融信息负面及主体判定

  • https://github.com/xiong666/ccf_financial_negative

  • https://github.com/Chevalier1024/CCF-BDCI-ABSA

  • https://github.com/rebornZH/2019-CCF-BDCI-NLP


text match 文本匹配

标签传播/Siamese RNN

kaggle-U.S. Patent Phrase to Phrase Matching

kaggle-TensorFlow 2.0 Question Answering

CCF-房产行业聊天问答匹配

天池-新冠疫情相似句对判定大赛

Kaggle: Quora Question Pairs

  • 4th code

  • 128th code

  • http://www.wuyuanhao.com/2019/02/25/quora-insincere-questions-classification%e6%80%bb%e7%bb%93/

CCF-技术需求与技术成果项目之间关联度

天池-小布助手对话短文本语义匹配

sohu-文本匹配

  • 2nd code

  • https://zhuanlan.zhihu.com/p/533808475

新冠疫情相似句对判定大赛

  • https://github.com/zzy99/epidemic-sentence-pair

  • https://github.com/daniellibin/nCoV-2019-sentence-similarity

  • https://mp.weixin.qq.com/s/B267GHm16ZIlKkxhOJmMqg

  • https://github.com/Rowchen/pytorch-for-Text-Matching


信息检索

kaggle-Learning Equality - Curriculum Recommendations

★★★★★

  • 多语言content与topic的文本匹配问题

  • 召回:tfidf + transformer arcface + rule

  • 排序:LGB

EEDI

  • 1st [code](https://github.com/rbiswasfc/eedi-mining-misconceptions

  • 10th [code](https://github.com/shyoulala/Kaggle_Eedi_2024_sayoulala

天池-问天引擎电商搜索算法赛

★★★☆☆

  • 电商场景下,中文匹配. 开源较少

wsdm2020

★★★★☆

kaggle-AI4code

★★★★★

  • notebook根据code cell 顺序markdown cell顺序

  • 1st code

  • https://github.com/louis-she/ai4code

  • https://www.kaggle.com/code/nickuzmenkov/ai4code-tensorflow-distilbert-baseline

WSDM2023 Pre-training for Web Search

**KDD Cup 2024-AQA

  • 1st code

    • 多fold来分别做这件事。多步迭代选择困难负样本,最终选择Top20. 初始向量模型召回1000/200,200作为困难负样本微调向量模型召回100,召回100的部分作为排序负样本,最终选择20

  • 3rd code

  • 7th code

  • 8th code

  • 9th code


MRC 信息抽取

kaggle-Tweet Sentiment Extraction

天池-瑞金医院MMC人工智能辅助构建知识图谱

百度aistudio事件抽取比赛

**CCKS&百度 2019中文短文本的实体链指))

2019之江杯人工智能大赛电商评论观点挖掘赛道

  • https://github.com/eguilg/OpinioNet

  • https://github.com/srtianxia/opinion_mining

科大讯飞2020事件抽取挑战赛

  • https://github.com/WuHuRestaurant/xf_event_extraction2020Top1

  • https://github.com/xiaoqian19940510/Event-Extraction

  • https://lonepatient.top/2022/07/12/gaiic_2022_ner_top10.html

  • https://github.com/luhua-rain/MRC_Competition_Dureader

  • https://github.com/YingZiqiang/LES-MMRC-Summary

  • https://github.com/basketballandlearn/MRC_Competition_Dureader


QA问答

CCKS2019 CKBQA

hinde

中医文献问题生成挑战

  • https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.3.2db024ddZShYhb&postId=10854

https://github.com/Dikea/Dialog-System-with-Task-Retrieval-and-Seq2seq https://github.com/xueyouluo/S2S-in-Production


其他

kaggle 手语识别

Last updated