The present disclosure relates to a feature generation at the time of information extraction, and more specifically, to a reduction of memory usage in a feature generation at the time of information extraction.
When information such as a named entity or a relation may be extracted from a text by machine learning, a corpus is read and, then, a model is generated. In the relation extraction, whether or not the predefined relation exists is determined, using as an input two or more named entity mentions (hereinafter also referred to as “NEs”). Features used in the relation extraction are often generated from NEs and data between the NEs. For features generation, a feature template such as “AllWords” may be used. The feature template, “AllWords”, is a scheme for coupling a token belonging to two NEs and a token between the NEs, where a space serves as a delimiter.
In some conventional machine learning scenarios, suffixes are given to the tokens so as to discriminate whether each token belong to the NE or is a token between the NEs.