--- abstract: 'Standard hybrid learners that use domain knowledge require stronger knowledge that is hard and expensive to acquire. However, weaker domain knowledge can benefit from prior knowledge while being cost effective. Weak knowledge in the form of feature relative importance (FRI) is presented and explained. Feature relative importance is a real valued approximation of a feature’s importance provided by experts. Advantage of using this knowledge is demonstrated by IANN, a modified multilayer neural network algorithm. IANN is a very simple modification of standard neural network algorithm but attains significant performance gains. Experimental results in the field of molecular biology show higher performance over other empirical learning algorithms including standard backpropagation and support vector machines. IANN performance is even comparable to a theory refinement system KBANN that uses stronger domain knowledge. This shows Feature relative importance can improve performance of existing empirical learning algorithms significantly with minimal effort.' altloc: - http://arxiv.org/abs/1005.5556 chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: ~ confloc: ~ contact_email: ~ creators_id: - stopofeger@yahoo.com creators_name: - family: Iqbal given: Ridwan Al honourific: '' lineage: '' date: 2010-06-04 date_type: submitted datestamp: 2010-06-06 14:35:46 department: ~ dir: disk0/00/00/68/55 edit_lock_since: ~ edit_lock_until: 0 edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 6855 fileinfo: /style/images/fileicons/application_pdf.png;/6855/1/fri.pdf full_text_status: public importid: ~ institution: ~ isbn: ~ ispublished: unpub issn: ~ item_issues_comment: [] item_issues_count: 0 item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: 'neural network, domain knowledge, prior knowledge, feature importance' lastmod: 2011-03-11 08:57:37 latitude: ~ longitude: ~ metadata_visibility: show note: ~ number: ~ pagerange: ~ pubdom: FALSE publication: ~ publisher: ~ refereed: FALSE referencetext: "1\r\nWinston, P. H. Learning structural descriptions from examples. MIT Technical Report, 1970.\r\n2\r\nPazzani, M., Mani, S., and Shankle, W. R. Comprehensible knowledge discovery in databases. In CogSci-97 ( 1997).\r\n3\r\nSimard, P. S., Victoni, B., LeCun, Y., and Denker, J. Tangent prop-A formalism for specifying selected invariances in an adaptive network. In Advances in Neural Information Processing Systems (San Mateo, CA 1992), Morgan Kaufmann.\r\n4\r\nPazzani, M., Brunk, C., and & Silverstein, G. A knowledge-intensive approach to learning relational concepts. In Proceedings of the Eighth International Workshop on Machine Learning (San Francisco 1991), 432-436.\r\n5\r\nMahoney, J. Jeffrey and Mooney, Raymond J. Combining Symbolic and Neural Learning to Revise Probabilistic Theories. In Proceedings of the 1992 Machine Learning Workshop on Integrated Learning in Real Domains ( 1992).\r\n6\r\nTowell, G. G. and Shavlik, J. W. Knowledge-based artificial neural networks. Artif. Intel., 70 (1994), 50-62.\r\n7\r\nFung, G., Mangasarian, O., and Shavlik, J. Knowledge-Based Support Vector Machine Classifiers. In Proceedings of Sixteenth Conference on Neural Information Processing Systems (NIPS) (Vancouver, Canada 2002).\r\n8\r\nScott, A., Clayton, J., & Gibson, E. A practical guide to knowledge acquisition. Addison-Wesley, 1991.\r\n9\r\nMarcus, S. (Ed.). Special issue on knowledge acquisition. Mach. Learn., 4 (1989).\r\n10\r\nBekkerman, R., El-Yaniv, R., Tishby, N., and Winter., Y. Distributional word clusters vs. words for text categorization. JMLR, 3 (2003), 1183–1208.\r\n11\r\nRuck, Dennis W., Rogers, Steven K., and Kabrisky, Matthew. Feature Selection Using a Multilayer Perceptron. Journal of Neural Network Computing, 2 (1990), 40-48.\r\n12\r\nGuyon, Isabelle and Elisseeff, Andr´e. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3 (2003), 1157-1182.\r\n13\r\nFriedman, J. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29 (2001), 1189-1232.\r\n14\r\nZien, Alexander, Kramer, Nicole, Sonnenburg, Soren, and Ratsch, Gunnar. The Feature Importance Ranking Measure. In ECML 09 ( 2009).\r\n15\r\nMitchell, Tom M. Artificial neural networks. In Machine learning. McGraw-Hill, 1997.\r\n16\r\nMitchell, Tom M. Artificial neural networks. In Machine learning. McGraw-Hill Science/Engineering/Math, 1997.\r\n17\r\nQuinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.\r\n18\r\nAha, D., Kibler, D., and & Albert, M. Instance-based learning algorithms. Machine learning, 6 (1991), 37-66.\r\n19\r\nVapnik, V. N. Statistical Learning Theory. Wiley, New York, 1998.\r\n20\r\nTowell, G., Shavlik, J., and Noordewier, M. Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks. In Proceedings of the Eighth National Conference on Artificial Intelligence (Boston, MA 1990), 861-866.\r\n21\r\nNoordewier, M., Towell, G., and Shavlik, J. Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences. In Advances in Neural Information Processing Systems (Denver, CO 1991), Morgan Kaufmann, 530-536." relation_type: [] relation_uri: [] reportno: ~ rev_number: 17 series: ~ source: ~ status_changed: 2010-06-06 14:35:46 subjects: - comp-sci-mach-learn succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: Empirical learning aided by weak domain knowledge in the form of feature importance type: preprint userid: 10433 volume: ~