The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before
sharing sensitive information, make sure you’re on a federal
government site.
The
https://
ensures that you are connecting to the
official website and that any information you provide is encrypted
and transmitted securely.
As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with,
the contents by NLM or the National Institutes of Health.
Learn more:
PMC Disclaimer
Acknowledgements
We are grateful to Kentaro Tomii and Toshiyuki Oda for constructive discussions. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics and the supercomputer system Shirokane at Human Genome Center, Institute of Medical Science, University of Tokyo.
Funding
This work was supported in part by the Top Global University Project from the Ministry of Education, Culture, Sports, Science, and Technology of Japan (MEXT), KAKENHI from the Japan Society for the Promotion of Science (JSPS) under Grant Number 18K18143 and Platform Project for Supporting in Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP18am0101067. The funding bodies did not play any role in the design of the study nor collection, analysis, nor interpretation of data nor in writing the manuscript.
Abbreviations
HMM
|
Hidden Markov model
|
LSTM
|
Long short-term memory
|
pAUC
|
Partial area under the ROC curve
|
PSSM
|
Position-specific scoring matrix
|
RNN
|
Recurrent neural network
|
ROC
|
Receiver operating characteristic
|
Authors’ contributions
KDY conducted the computational experiments and wrote the manuscript. KK supervised the study and wrote the manuscript. Both authors have read and approved the final manuscript.
Notes
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
Electronic supplementary material
The online version of this article (10.1186/s12859-018-2284-1) contains supplementary material, which is available to authorized users.
Contributor Information
Kazunori D. Yamada,
Email:
pj.ca.ukohot.iece@adamayk
.
Kengo Kinoshita,
Email:
pj.ca.ukohot.iece@ognek
.
References
1.
Ncbi-Resource-Coordinators Database resources of the National Center for biotechnology information.
Nucleic Acids Res.
2017;
45
(D1):D12–D17. doi: 10.1093/nar/gkw1071.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
2.
Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Nat Methods.
2011;
9
(2):173–175. doi: 10.1038/nmeth.1818.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
3.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res.
1997;
25
(17):3389–3402. doi: 10.1093/nar/25.17.3389.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
4.
Biegert A, Soding J. Sequence context-specific profiles for homology searching.
Proc Natl Acad Sci U S A.
2009;
106
(10):3770–3775. doi: 10.1073/pnas.0810767106.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
5.
Angermuller C, Biegert A, Soding J. Discriminative modelling of context-specific amino acid substitution probabilities.
Bioinformatics.
2012;
28
(24):3240–3247. doi: 10.1093/bioinformatics/bts622.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
6.
Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL. Domain enhanced lookup time accelerated BLAST.
Biol Direct.
2012;
7
:12. doi: 10.1186/1745-6150-7-12.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
7.
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal Approximators.
Neural Netw.
1989;
2
(5):359–366. doi: 10.1016/0893-6080(89)90020-8.
[
CrossRef
]
[
Google Scholar
]
8.
Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm.
BMC Bioinformatics.
2017;
18
(1):277. doi: 10.1186/s12859-017-1700-2.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
9.
Du X, Sun S, Hu C, Yao Y, Yan Y, Zhang Y. DeepPPI: boosting prediction of protein-protein interactions with deep neural networks.
J Chem Inf Model.
2017;
57
(6):1499–510. doi: 10.1021/acs.jcim.7b00028.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
10.
Wang S, Peng J, Ma J, Xu J. Protein secondary structure prediction using deep convolutional neural fields.
Sci Rep.
2016;
6
:18962. doi: 10.1038/srep18962.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
11.
Spencer M, Eickholt J, Cheng J. A deep learning network approach to ab initio protein secondary structure prediction.
IEEE/ACM Trans Comput Biol Bioinform.
2015;
12
(1):103–112. doi: 10.1109/TCBB.2014.2343960.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
12.
Di Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction.
Bioinformatics.
2012;
28
(19):2449–2457. doi: 10.1093/bioinformatics/bts475.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
13.
Heffernan R, Yang Y, Paliwal K, Zhou Y. Capturing non-local interactions by long short term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers, and solvent accessibility.
Bioinformatics.
2017;
33
(18):2842–9. doi: 10.1093/bioinformatics/btx218.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
14.
LeCun Y, Bengio Y, Hinton G. Deep learning.
Nature.
2015;
521
(7553):436–444. doi: 10.1038/nature14539.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
15.
Kingma D, Ba J.
arXiv preprint arXiv:14126980.
2014. Adam: a method for stochastic optimization.
[
Google Scholar
]
16.
Hochreiter S, Schmidhuber J. Long short-term memory.
Neural Comput.
1997;
9
(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
17.
Hanson J, Yang Y, Paliwal K, Zhou Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.
Bioinformatics.
2017;
33
(5):685–692.
[
PubMed
]
[
Google Scholar
]
18.
Kim L, Harer J, Rangamani A, Moran J, Parks PD, Widge A, Eskandar E, Dougherty D, Chin SP. Predicting local field potentials with recurrent neural networks.
Conf Proc IEEE Eng Med Biol Soc.
2016;
2016
:808–811.
[
PubMed
]
[
Google Scholar
]
19.
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. The Pfam protein families database: towards a more sustainable future.
Nucleic Acids Res.
2016;
44
(D1):D279–D285. doi: 10.1093/nar/gkv1344.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
20.
Hauser M, Mayer CE, Soding J. kClust: fast and sensitive clustering of large protein sequence databases.
BMC Bioinformatics.
2013;
14
:248. doi: 10.1186/1471-2105-14-248.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
21.
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition.
Bioinformatics.
2017;
33
(14):I37–I48. doi: 10.1093/bioinformatics/btx228.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
22.
Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics.
PLoS One.
2015;
10
(11):0141287. doi: 10.1371/journal.pone.0141287.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
23.
Yu D, Seltzer ML, Li J, Huang J-T, Seide F.
arXiv preprint arXiv:13013605.
2013. Feature learning in deep neural networks-studies on speech recognition tasks.
[
Google Scholar
]
24.
Ciregan D, Meier U, Schmidhuber J: Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR)
,
2012 IEEE conference on: 2012. IEEE: 3642–3649.
25.
Ciresan DC, Meier U, Masci J, Maria Gambardella L, Schmidhuber J: Flexible, high performance convolutional neural networks for image classification. In: IJCAI proceedings-international joint conference on artificial intelligence
:
2011. Barcelona, Spain: 1237.
26.
Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM.
Neural Comput.
2000;
12
(10):2451–2471. doi: 10.1162/089976600300015015.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
27.
Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.
J Mol Biol.
2001;
313
(4):903–919. doi: 10.1006/jmbi.2001.5080.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
28.
Gribskov M, Robinson NL. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.
Comput Chem.
1996;
20
(1):25–33. doi: 10.1016/S0097-8485(96)80004-0.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
29.
Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins.
Science.
1985;
229
(4716):834–838. doi: 10.1126/science.4023714.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
30.
Chou PY, Fasman GD. Prediction of protein conformation.
Biochemistry.
1974;
13
(2):222–245. doi: 10.1021/bi00699a002.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
31.
Shirota M, Ishida T, Kinoshita K. Effects of surface-to-volume ratio of proteins on hydrophilic residues: decrease in occurrence and increase in buried fraction.
Protein Sci.
2008;
17
(9):1596–1602. doi: 10.1110/ps.035592.108.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
32.
Bradley P, Baker D. Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation.
Proteins.
2006;
65
(4):922–929. doi: 10.1002/prot.21133.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
33.
Cheng J, Baldi P. Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms.
Bioinformatics.
2005;
21
(Suppl 1):i75–i84. doi: 10.1093/bioinformatics/bti1004.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
34.
Bishop CM.
Pattern recognition and machine learning.
New York: Springer; 2006.
[
Google Scholar
]
35.
Goodfellow I, Bengio Y, Courville Y: Deep learning: MIT Press; 2016.
36.
Soding J, Remmert M. Protein sequence comparison and fold recognition: progress and good-practice benchmarking.
Curr Opin Struct Biol.
2011;
21
(3):404–411. doi: 10.1016/j.sbi.2011.03.005.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
37.
Yamada KD. Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.
Algorithms Mol Biol.
2018;
13
:5. doi: 10.1186/s13015-018-0123-6.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
38.
Yamada KD, Tomii K, Katoh K. Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees.
Bioinformatics.
2016;
32
(21):3246–3251. doi: 10.1093/bioinformatics/btw412.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
39.
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks.
Proc Natl Acad Sci U S A.
1992;
89
(22):10915–10919. doi: 10.1073/pnas.89.22.10915.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
40.
Yamada K, Tomii K. Revisiting amino acid substitution matrices for identifying distantly related proteins.
Bioinformatics.
2014;
30
(3):317–325. doi: 10.1093/bioinformatics/btt694.
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
41.
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A. Mastering the game of go without human knowledge.
Nature.
2017;
550
(7676):354. doi: 10.1038/nature24270.
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
Articles from
BMC Bioinformatics
are provided here courtesy of
BMC