My research interests focus on Computational Healthcare, Computational Biology, Biomedical Text Mining (BioNLP), Bioinformatics, Machine Learning and Graph Learning. My current projects focuses on phenotype annotation and embedding. Here you can find materials relevant to my published papers.

I also provide the titles for some working papers and projects in progress below. Full links are attached for these papers when I believe they are ready for peer review. Feel free to contact me if you are interested in some of these works.

Selected Publications

Shankai Yan, Ling Luo, Li Fang, Daniel Veltri, Andrew J. Oler, Rajarshi Ghosh, Chih-Hsuan Wei, Morgan Similuk, Kai Wang, and Zhiyong Lu✉. (2022). PhenoGene: Disease-gene prioritization using graph embedding on patient phenotypic profiles. AMIA2022

Shankai Yan, Ling Luo, Po-Ting Lai, Daniel Veltri, Andrew J. Oler, Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N Robinson, and Zhiyong Lu✉. (2022). PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology. Journal of Biomedical Informatics 129: 104059. Code Data

Shankai Yan and Ka-Chun WONG✉. (2019). Context awareness and embedding for biomedical event extraction. Oxford Bioinformatics 36(2): 637-643. Code Data

Ling Luo, Shankai Yan, Po-Ting Lai, Daniel Veltri, Andrew Oler, Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N Robinson, and Zhiyong Lu✉. (2021). PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology . Oxford Bioinformatics 37(13): 1884-1890. Code Data

Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, and Zhiyong Lu✉. (2021) Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annual Review of Biomedical Data Science 4: 313-339.

Qingyu Chen, Kyubum Lee, Shankai Yan, Sun Kim, and Zhiyong Lu✉. (2020) BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale. PLOS Computational Biology 16(4): e1007617. Code Data

Yifan Peng, Shankai Yan and Zhiyong Lu✉. (2019). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. ACL BioNLP Workshop. Code Data

Ka-Chun WONG✉, Junyi Chen, Jiao Zhang, Jiecong Lin, Shankai Yan, Shxiong Zhang, Xiangtao Li, Cheng Liang, Chengbin Peng, Qiuzhen Lin, Sam Kwong, and Jun Yu. (2019). Early Cancer Detection from Multianalyte Blood Test Results. CellPress iScience 15: 332-341.

Shankai Yan and Ka-Chun WONG✉. (2019). GESgnExt: Gene Expression Signature Extraction and Meta-analysis on Gene Expression Omnibus. IEEE Journal of Biomedical and Health Informatics 24(1): 311-318. Code Data

Shankai Yan and Ka-Chun WONG✉. (2017). Elucidating high-dimensional cancer hallmark annotation via enriched ontology. Journal of Biomedical Informatics 73: 84-94. Code Data

Other Publications

Yanbo Han, Buchao Zhan, Bin Zhang, Chao Zhao and Shankai Yan✉. (2024). BiCalBERT: An Efficient Transformer-based Model for Chinese Question Answering. ISMSI 2024 (Accepted)

Zhe Liu, Hiu-Man Wong, Xingjian Chen, Jiecong Lin, Shixiong Zhang, Shankai Yan, Fuzhou Wang, Xiangtao Li, Ka-Chun Wong✉. (2023). MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm. Computers in Biology and Medicine (Accepted)

Ruiqi Liu, Xiuhao Fu, Shankai Yan, Zilong Zhang✉, and Feifei Cui. (2023). AIPPT: Predicts anti-inflammatory peptides using the most characteristic subset of bases and sequences by stacking ensemble learning strategies. IEEE BIBM (Accepted)

Hiu-Man Wong, Xingjian Chen, Hiu-Hin Tam, Jiecong Lin, Shixiong Zhang, Shankai Yan, Xiangtao Li, Ka-Chun Wong✉. (2021). Feature Selection and Feature Extraction: Highlights. ISMSI: 49-53.

Shankai Yan and Ka-Chun WONG✉. (2021). Future DNA computing device and accompanied tool stack: Towards high-throughput computation. Future Generation Computer Systems 117: 111-124.

Ka-Chun WONG✉, Jiao Zhang, Shankai Yan, Xiangtao Li, Qiuzhen Lin, Sam KWONG and Cheng Liang. (2019). DNA Sequencing Technologies: Sequencing Data Protocols and Bioinformatics Tools. ACM Computing Surveys 52(5): 1-30.

Ka-Chun WONG✉, Shankai Yan, Qiuzhen Lin, Xiangtao Li and Chengbin Peng. (2018). Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors. IEEE/ACM Transactions on Computational Biology and Bioinformatics 17(1): 327-333.

Junyi Chen, Shankai Yan and Ka-Chun WONG✉. (2018). Verbal aggression detection on Twitter comments: convolutional neural network for short-text sentiment analysis. Neural Computing and Applications 32: 10809-10818.

Ka-Chun WONG✉, Chengbin Peng, Shankai Yan and Cheng Liang. (2017). Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles With Model Regularization. IEEE Transactions on NanoBioscience 16(1): 43-50.

Junyi Chen, Shankai Yan and Ka-Chun WONG✉. (2017). Aggressivity Detection on Social Network Comments. Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence 103-107.


Shankai Yan, Kathleen Steinhofel✉, Paul Bates, Mariam Molokhia. (2018). Novel HLA Subclass Clustering methods to characterize Liver Toxicity Phenotype. ACPE 2018. (talk)

Working Papers and Projects in Progress

Fine-grained Phenotype Concept Recognition & Phenotype Embedding & Phenotype-driven Gene/Disease Prioritization & Graph-embedding-based Linkage Analysis and Association Study