Research
My research interests focus on Computational Healthcare, Computational Biology, Biomedical Text Mining (BioNLP), Bioinformatics, Machine Learning and Graph Learning.
Selected Publications
Buchao Zhan, Yucong Duan, Xin Yang, Dongmei He, and Shankai Yan✉. (2024). Text2SPARQL: Grammar Pre-training for Text-to-QDMR Semantic Parsers from Intermediate Question. ICONIP2024 (Accepted) Code Data
Buchao Zhan, Anqi Li, Xin Yang, Dongmei He, Yucong Duan, and Shankai Yan✉. (2024). RARoK:Retrieval-Augmented Reasoning on Knowledge for Medical Question Answering. BIBM2024 (Accepted) Code Data
Dongmei He, Xin Yang, Buchao Zhan, Zilong Zhang, Qingchen Zhang, and Shankai Yan✉. (2024). Augmented Mycobiome-Based Cancer Detection by an Interpretable Large Model. BIBM2024 (Accepted) Code Data
Shankai Yan, Ling Luo, Li Fang, Daniel Veltri, Andrew J. Oler, Rajarshi Ghosh, Chih-Hsuan Wei, Morgan Similuk, Kai Wang, and Zhiyong Lu✉. (2022). PhenoGene: Disease-gene prioritization using graph embedding on patient phenotypic profiles. AMIA2022
Shankai Yan, Ling Luo, Po-Ting Lai, Daniel Veltri, Andrew J. Oler, Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N Robinson, and Zhiyong Lu✉. (2022). PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology. Journal of Biomedical Informatics 129: 104059. Code Data
Shankai Yan and Ka-Chun WONG✉. (2019). Context awareness and embedding for biomedical event extraction. Oxford Bioinformatics 36(2): 637-643. Code Data
Ling Luo, Shankai Yan, Po-Ting Lai, Daniel Veltri, Andrew Oler, Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N Robinson, and Zhiyong Lu✉. (2021). PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology . Oxford Bioinformatics 37(13): 1884-1890. Code Data
Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, and Zhiyong Lu✉. (2021) Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annual Review of Biomedical Data Science 4: 313-339.
Qingyu Chen, Kyubum Lee, Shankai Yan, Sun Kim, and Zhiyong Lu✉. (2020) BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale. PLOS Computational Biology 16(4): e1007617. Code Data
Yifan Peng, Shankai Yan and Zhiyong Lu✉. (2019). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. ACL BioNLP Workshop. Code Data
Ka-Chun WONG✉, Junyi Chen, Jiao Zhang, Jiecong Lin, Shankai Yan, Shxiong Zhang, Xiangtao Li, Cheng Liang, Chengbin Peng, Qiuzhen Lin, Sam Kwong, and Jun Yu. (2019). Early Cancer Detection from Multianalyte Blood Test Results. CellPress iScience 15: 332-341.
Shankai Yan and Ka-Chun WONG✉. (2019). GESgnExt: Gene Expression Signature Extraction and Meta-analysis on Gene Expression Omnibus. IEEE Journal of Biomedical and Health Informatics 24(1): 311-318. Code Data
Shankai Yan and Ka-Chun WONG✉. (2017). Elucidating high-dimensional cancer hallmark annotation via enriched ontology. Journal of Biomedical Informatics 73: 84-94. Code Data
Other Publications
Siqi Dong, Buchao Zhan and Shankai Yan✉. (2024). Food Named Entity Recognition with BERT and Adversarial Training. MLNLP2024 (Accepted)
Yuchen Ma, Buchao Zhan, Jianhua Yu and Shankai Yan✉. (2024). SACMR: Sentiment Analysis in Chinese Language using Modified RoBERTa. ICCIA2024 (Accepted)
Buchao Zhan, Yucong Duan and Shankai Yan✉. (2024). IC-BERT: An Instruction Classifier Model Alleviates the Hallucination of Large Language Models in Traditional Chinese Medicine. ICCIA2024 (Accepted)
Yanbo Han, Buchao Zhan, Bin Zhang, Chao Zhao and Shankai Yan✉. (2024). BiCalBERT: An Efficient Transformer-based Model for Chinese Question Answering. Proceedings of the 2024 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence 100-104.
Zhe Liu, Hiu-Man Wong, Xingjian Chen, Jiecong Lin, Shixiong Zhang, Shankai Yan, Fuzhou Wang, Xiangtao Li, Ka-Chun Wong✉. (2023). MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm. Computers in Biology and Medicine 168: 107753
Ruiqi Liu, Xiuhao Fu, Shankai Yan, Zilong Zhang✉, and Feifei Cui. (2023). AIPPT: Predicts anti-inflammatory peptides using the most characteristic subset of bases and sequences by stacking ensemble learning strategies. BIBM 2023
Hiu-Man Wong, Xingjian Chen, Hiu-Hin Tam, Jiecong Lin, Shixiong Zhang, Shankai Yan, Xiangtao Li, Ka-Chun Wong✉. (2021). Feature Selection and Feature Extraction: Highlights. Proceedings of the 2021 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence: 49-53.
Shankai Yan and Ka-Chun WONG✉. (2021). Future DNA computing device and accompanied tool stack: Towards high-throughput computation. Future Generation Computer Systems 117: 111-124.
Ka-Chun WONG✉, Jiao Zhang, Shankai Yan, Xiangtao Li, Qiuzhen Lin, Sam KWONG and Cheng Liang. (2019). DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools. ACM Computing Surveys 52(5): 1-30.
Ka-Chun WONG✉, Shankai Yan, Qiuzhen Lin, Xiangtao Li and Chengbin Peng. (2018). Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors. IEEE/ACM Transactions on Computational Biology and Bioinformatics 17(1): 327-333.
Junyi Chen, Shankai Yan and Ka-Chun WONG✉. (2018). Verbal aggression detection on Twitter comments: convolutional neural network for short-text sentiment analysis. Neural Computing and Applications 32: 10809-10818.
Ka-Chun WONG✉, Chengbin Peng, Shankai Yan and Cheng Liang. (2017). Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles With Model Regularization. IEEE Transactions on NanoBioscience 16(1): 43-50.
Junyi Chen, Shankai Yan and Ka-Chun WONG✉. (2017). Aggressivity Detection on Social Network Comments. Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence 103-107.
Abstracts
Shankai Yan, Kathleen Steinhofel✉, Paul Bates, Mariam Molokhia. (2018). Novel HLA Subclass Clustering methods to characterize Liver Toxicity Phenotype. ACPE 2018. (talk)
Working Papers and Projects in Progress
Heterogeneous Graph based Phenotype Embedding & Phenotype-driven Gene/Disease Prioritization with HyperGraph Embedding & Graph-embedding-based Linkage Analysis and Association Study & Biomedical Question Answering with Knowledge Graphs and Chain of Thought & Detection of Psychological Abnormality from Social Media & Food-Ingredients-Phenotype/Disase Relation Recognition & Single-cell Sequencing Data Encoder for Drug Perturbation Prediction.