@Title: On-demand Information Extraction
@File: mavir05.txt
@Participants: SEK, Satoshi, (man, C, 3, research associate professor, lecturer, Japan)
ANT, Antonio, (man, C, 3, associate professor, lecturer, Madrid)
@Date: 15/11/2007
@Place: Madrid
@Situation: Conference (II Jornadas MAVIR), conference room at university, not hidden, researcher as observer
@Topic: Language Technologies
@Source: MAVIR
@Class: formal in natural context, conference, monologue
@Length: 36' 08''
@Words: 4461
@Acoustic_quality: A
@Transcriber: Marta Garrote
@Revisor: Leonardo Campillos, Manuel Alcántara
@Comments:
ANT thank you to come here 00:00
SEK thank you 00:01
SEK thank you 00:02
SEK &ah / buenos días // and / gracias 00:06
SEK that's / that's two words I learnt 00:08
SEK ok // and now I have to speak in English 00:11
SEK &ah / ok 00:13
SEK so today / &ah / I'm talking about on-demand information extraction 00:17
SEK that is my / main research 00:19
SEK ok / I'm working on / named entity or paraphrase / etcetera // but these are all components for this task // for this system 00:26
SEK and / I presented this system at &ah ACL // as a demonstration and // I believe some Spanish guy came to me / and / give some good questions and // I don't know one of you / maybe / the person or not 00:41
SEK maybe not 00:42
SEK so / maybe this is new for / everybody 00:46
SEK so that's nice 00:47
SEK ok / so / xxx / he gave me / very good introduction // but a [/] xxx [/] at [/] a little bit more 00:54
SEK &ah / recently I'm (happy) working on / &ah / organizing a workshop on textual entailment and paraphrasing 01:02
SEK so that's &ah one of the area I'm working on // and named &en [/] named entity recognition 01:08
SEK &ah -> / this is a journal / of [/] French journal / I think 01:13
SEK I [/] I don't know 01:14
SEK &ah so / this is my area too 01:17
SEK and also / &ah / I was organizing web people search task 01:21
SEK this is &ah / joined work with Javier / and Julio // there 01:25
SEK and this is this / task to search the people name // which maybe ambiguous 01:30
SEK Satoshi Sekine exits / maybe five different people 01:33
SEK and we want to distinguish it // so that's another task 01:37
SEK hhh {%act: interjection} 01:39
SEK ok these are the areas // and actually / I'm happy working on / last five years / on / these topics // ok ? 01:47
SEK on-demanding information extraction at the top // this is the main area / I'm / talking today 01:51
SEK but also I work on summarization / IR and QA // and xxx English analyzer / named entity / and a Query log and web people search in Japanese spelling / etcetera 02:04
SEK but / if you are interested in / some thing other than on-demand IE / please come to me after this talk and / ask me some questions 02:11
SEK ok ? 02:12
SEK ok / so / on-demanding information extraction 02:17
SEK first I will / describe what is information extraction 02:21
SEK maybe some of you / may know / that / &ah information extraction is a task to automatically extract information / on specific / scenario / from unstructured texts like newspaper 02:33
SEK and / put it / into table format 02:37
SEK so / for example / if you are interested in management succession // then / you got the newspaper which / contents lots of management succession events 02:47
SEK and / put / this into a table like / this date / this person / &ah get that [/] this / companies / this position 02:56
SEK ok ? 02:56
SEK that [/] this is much easier to see 02:59
SEK the information is structured // and / John Smith xxx company's this position // instead of reading all the news 03:09
SEK ok ? 03:09
SEK so this is called / information extraction 03:12
SEK but / there are a lot of problems 03:16
SEK the main problem / or one of the main problems / &ah / is [/] is &ah -> / &pre [/] preparation of knowledge // for a given scenario 03:26
SEK for example / once / we have a task like management succession // we have to create lots of / knowledge / about / this task / like / we have to create pattern // like / &c [/] &ah -> / company announced person's promotion to position 03:42
SEK so this is a pattern 03:43
SEK and this / tells / the management succession event 03:46
SEK and this is the company / &ah / he is going to be / in // and this is the position / and this is the person name 03:54
SEK so we have to / prepare this kind of knowledge 03:56
SEK you can imagine that / this is not only the pattern 03:59
SEK there are lots of lots of patterns / which / &ah express the management succession event 04:05
SEK so this is very laborious 04:08
SEK &ah / people has been using / &ah creating this by hand / in nineteen eighties // &ah when MUC / preparation 04:15
SEK this is one of the biggest / &ah information extraction // &ah -> preparation 04:21
SEK and people use / &ah / create by hand // which is [/] takes lot of time 04:26
SEK or create training data // and / use machine learning // to learn this patterns 04:32
SEK but that's also / time-consuming because / you have to create lots of training data 04:37
SEK so / at the MUC / Message Understanding Conference / &ah we have one month / to create this knowledge 04:46
SEK so / the organizer tell me that / task for this year 04:50
SEK this year we are interested in management succession 04:52
SEK this year we are interested in / &ah disease outbreak / etcetera 04:57
SEK then one month we create the knowledge / by hand or / using the training data 05:02
SEK then we / &a [/] after one month / we / evaluate the system 05:06
SEK so it's very very / limited 05:09
SEK &ah / so once / if you want to move to another scenario / you have to spend / another one month / to create the system 05:17
SEK all this is / bad 05:20
SEK so / my / goal / is / make this one month / into one / minute 05:27
SEK ok ? 05:29
SEK automatic 05:30
SEK &ah / in other words / creating this pattern / should be done / automatically 05:35
SEK ok ? 05:36
SEK and how ? 05:37
SEK so I'm going to tell you how / I did this 05:40
SEK &ah / I used unsupervised learning methods // &ah &pat [/] pattern discovery and paraphrase discovery // I'm going to tell 05:48
SEK and also I prepare as much knowledge as possible / for as many scenario as possible 05:54
SEK so this is for / extended named entity 05:58
SEK &ah some of / you may know I have / two hundred category named entities // and not only people location organization but / this is name / or position name etcetera 06:08
SEK ok 06:09
SEK and &ah / you / &ah [/] I can connect to the wire 06:13
SEK this says < ok ? > 06:14
ANT [<] < yes > 06:14
ANT / yes 06:15
SEK xxx very fast 06:18
SEK so at the beginning / I will show you / is there 06:21
SEK ok ? 06:22
SEK and &ah = takes time 06:27
SEK ok ? 06:29
SEK so this example 06:30
SEK I know / this / can create us [/] create a table 06:34
SEK so this is input 06:35
SEK acquire / acquisition xxx 06:38
SEK so this is talking about company's / acquisition of another company 06:42
SEK and / it will take one minute 06:45
SEK please wait with me 06:46
SEK I believe it's one minute 06:48
SEK ok 06:50
SEK and / it going to / create a table 06:53
SEK so I take this [/] this examples / &ah / topics on / ACE evaluation 06:59
SEK this is another evaluation on / information extraction / happening in United States / this days 07:05
SEK and / they have &ah / twenty or thirty / &ah topics like these 07:10
SEK and we tried = xxx at the end I can show you the evaluation results / but this &o [/] ok ? 07:17
SEK so this is a table 07:18
SEK so / this is company / company / company / &ah sometimes doesn't have company but date / and money 07:26
SEK and you can see this / example 07:29
SEK sentence says &ah / hhh {%act: blablabla imitating talk} which / this company acquired a [/] as part of its xxx / &ah two points three billion projects of / this company in nineteen ninety five 07:41
SEK so this is / exactly what I want 07:45
SEK ok ? 07:45
SEK and -> / I didn't [/] I didn't do any magic / behind this 07:51
SEK this is really true and -> you can &su [/] see these / tables / &ah sometimes it's wrong / yeah 07:59
SEK there are mistakes / of course 08:00
SEK but it's creates these / tables 08:04
SEK ok ? 08:05
SEK and / at the end / I [/] I [/] I will / ask you to give me / some task // anything 08:10
SEK I can [/] as long as I can type in here / I want to try 08:15
SEK and / at the ACL this Spanish guy came to me / and he // I don't know // didn't know what's / information extraction 08:23
SEK and he asked me the question like Spain // type Spain 08:28
SEK maybe I can try that {%com: he types and whispers} 08:33
SEK can you guess what kind of table will create it [/] will be created // about Spain ? 08:40
SEK yeah 08:41
SEK I was afraid 08:42
SEK ok / this I never tried this kind of question 08:44
SEK &ah the demos / and he typed xxx 08:48
SEK I looked at the [/] behind 08:52
SEK ok 08:53
SEK takes [/] take one minute 08:54
SEK {%com: whispers} yeah / I have idea to / make it [/] make this at least / thirty seconds / but / at the moment ... 09:03
SEK ok 09:04
SEK so this is the result 09:05
SEK you can tell 09:07
SEK right ? 09:08
SEK this is a result xxx was supposed to get 09:12
SEK I can pick one of them 09:14
SEK maybe this one 09:15
SEK {%com: he waits until the page loads} Netherlands beats Spain 09:25
SEK hhh {%act: interjection} beat hhh {%act: laugh} I didn't know 09:28
SEK you know what I'm forward to xxx 09:30
SEK so / yeah ? 09:31
SEK this is what / maybe / &ah we can expect from / a question like country name 09:36
SEK so this is a lot [/] this / kind of events happen in the newspaper a lot of times 09:41
SEK so this has be 09:42
SEK so this is information extraction task 09:44
SEK so / &ah xxx / the event has to happen / repeatedly / in / newspaper 09:51
SEK that is the task for information extraction 09:53
SEK so / we can not just [/] somebody asks / Spanish wine 09:57
SEK but it's never happening in newspaper so often // so it's / not the task for the / information extraction 10:02
SEK so / please / think / about the task 10:05
SEK and I can / ask at the end 10:07
SEK ok ? 10:08
SEK ok / so / then / now / I'm going to tell / what's going on behind this 10:16
SEK ok ? 10:18
SEK so this is a overview of the / process 10:22
SEK so / I got the description of the task like the ones / I show you // Spain etcetera // and information retrieval system just run 10:31
SEK this is / very simple xxx base / information retrieval system 10:34
SEK hhh {%act: interjection} ok ? 10:37
SEK and / I got several document 10:39
SEK I think I / got / one thousand documents 10:41
SEK I have a threshold 10:43
SEK and then from this document I got patterns // like the one I showed you / like a person's promotion to location / &ah promotion {%alt: promo-tion} to xxx a position 10:53
SEK then / I have lots of patterns 10:56
SEK ok / one certain / pattern scored by the relevancy 11:00
SEK and [///] but / if I have a one thousand pattern / so I can create one thousand different tables / but it's [/] it's not what we want 11:10
SEK we don't want to look at all -> one thousand tables 11:13
SEK so what I have to do is / connect these patterns // semantically 11:17
SEK if / these two patterns are talking about the same thing / we have to cluster at them 11:22
SEK so that we have a less number of tables 11:25
SEK ok ? 11:27
SEK so / I'm doing these / which are using paraphrase discovery method 11:31
SEK this is &al [/] also / &ah online / &ah -> on-demand // depending on what you ask 11:38
SEK ok 11:40
SEK and then we have a pattern set 11:42
SEK ok ? 11:42
SEK we have ten set of patterns / then create a table from the [/] from the / corpus // newspaper 11:49
SEK and in order to do this / I have to have a language analyzer / of course / and a &na [/] extended named entity tagger // and co-reference 11:57
SEK &ah -> but the new thing on this system / is / pattern discovery / paraphrase discovery and named entity tagger 12:04
SEK so I'm going to describe this three components / in detail 12:07
SEK ok / the first one is / &ah pattern discovery 12:12
SEK &eh actually this / idea is very simple 12:16
SEK ok ? 12:17
SEK &ah / I [/] again / I want to find / the patterns / like companies announced &p [/] person's promotion to position 12:24
SEK and the key idea is / ok / I use / information retrieval // and get the / documents 12:30
SEK and the patterns which appear in this document very often / compared to background // maybe important pattern for this / domain 12:40
SEK so this is / very / like a xxx idea [/] idea 12:44
SEK ok ? 12:45
SEK &ah -> / so / I got &t [/] for &pos [/] for example Spain 12:51
SEK I got a lot of documents / containing Spain 12:54
SEK and / such / the patterns / which connect to Spain / in this document // &ah -> which appears a lot of times in this document / and not in the / background / the &who [/] entire document 13:07
SEK that's maybe the xxx / patterns which is important from Spain 13:11
SEK that's how I got the -> [/] some country beats / of Spain / etcetera 13:17
SEK ok ? 13:18
SEK &ah -> but / the problem is / what is pattern ? 13:22
SEK what is the extract format of the pattern ? 13:25
SEK so we can think / lot of different things // like &ah -> predicate argument structure 13:30
SEK so / maybe / argument is very important for events 13:34
SEK so / between predicate and argument / can be a pattern 13:38
SEK or / we parse / sentences // and the change from the top node to the end // maybe this can be a pattern too 13:47
SEK or / any kind of subtree in the parsetree 13:51
SEK if you know parsetree 13:52
SEK ok 13:53
SEK &ah -> / that can be a pattern too 13:56
SEK and &ah / from here to / &s [/] &ah from / predicate argument / to subtree / is / more general / but it's / computationally very expensive 14:05
SEK if you have one thousand / sentences parsed // and / any part of the subtree // maybe / millions or / much more 14:16
SEK so this is computational / &ah very expensive // and [///] but / &ah luckily we have [///] there is a algorithm / tree mining algorithm / which &co [/] really count these subtrees // important subtrees 14:29
SEK &ah -> xxx somebody / &ah in the / &ah machine learning area 14:34
SEK so &ah we use these algorithms // and at the end / near human performance was achieved 14:39
SEK ok // &ah -> / and / this is / &ah evaluation result using the -> [/] some / domain / &ah -> succession domain 14:50
SEK ok ? 14:51
SEK and &ah / subtree is expensive and it takes / lot of time but it's / achieved &ah best / &becou [///] &ah this is precision / recall // and if you go / this direction / it is best 15:04
SEK if this is here {%com: he is pointing the screen} / ten [/] hundred percent precision / eighty percent recall 15:08
SEK and the human performance is somewhere here 15:10
SEK it needs ninety / and sixty / or something this 15:15
SEK so / this subtree method is / quite good // compared to human 15:20
SEK so it finds a lot of nice patterns // based on this idea 15:25
SEK ok? 15:26
SEK &ah that's a [///] yeah / it's not detailed but / it's a very high &ah description of the pattern discovery 15:34
SEK ok ? 15:36
SEK by this method we find / lots of patterns 15:41
SEK but as I said / there are so many patterns // and some of the patterns are talking about the same thing 15:47
SEK is [/] even if it / looks different // like &ah / I don't know 15:51
SEK &ah -> Netherlands beats Spain 15:53
SEK or / Spain was -> / I don't know / xxx [/] beaten by [/] of course beaten by Netherlanders too // but also another expressions too 16:03
SEK so we have to find the relationship between / information extraction patterns 16:08
SEK ok ? 16:09
SEK otherwise / these expressions create different tables 16:13
SEK so / &ah one of the [///] &ah I have &ah three methods / of paraphrase discovery 16:19
SEK and this is only one of the methods 16:21
SEK {%com: drinks water} I think we trust / which is / most interesting one 16:29
SEK and the key idea for this method is / the / xxx 16:34
SEK ok ? 16:35
SEK events / are usually reported in different newspapers on the same day 16:39
SEK if you have &ah two newspapers / New York Times and the Washington Post // they are talking about the same thing if there is a big event 16:47
SEK I don't know / Netherlands beats Spain 16:49
SEK that's a big event / ok ? 16:51
SEK and &ah [///] so we can find this / expression about / Netherlands beats Spain 16:58
SEK &ah but / we have to find where it is 17:02
SEK and the key / is the named entity 17:05
SEK so / Spain = whatever that expression is / Netherlands / and Spain / or two to three // this expression must be the same 17:13
SEK so use these named &en [/] named entities / as anchor / we can find / paraphrase 17:21
SEK that's the idea 17:22
SEK ok ? 17:24
SEK and &ah we observed encouraging results 17:26
SEK ok ? 17:27
SEK so this is &a [/] again / this is the procedure 17:30
SEK we have two newspapers / Washington Post and New York Times maybe // and / xxx fast find a comparable article / talking about the soccer result of Spain 17:41
SEK ok ? 17:42
SEK then we named &en [/] we tag the named entity / among these / xxx 17:48
SEK so / &sa [/] &ah Spain and Netherlands etcetera 17:52
SEK ok ? 17:54
SEK then we / parse them // of course // and &ah we find the chunk / which is talking about this event 18:01
SEK so Netherlands beat / Spain // or Spain's [/] Spain's / beat by / &ah Netherlands etcetera 18:09
SEK ok ? 18:10
SEK then extract / this paraphrase 18:13
SEK ok ? 18:13
SEK this / is the method 18:15
SEK ok ? 18:17
SEK and this was / presented at some / conference two thousand two or three / etcetera 18:22
SEK and &ah this is the result / one of the results / based on Japanese newspaper 18:27
SEK &ah on / special / event like murder suspect 18:31
SEK ok ? 18:33
SEK and &ah / accuracy is [/] precision is something like / fifty / &sixt [/] sixty two 18:38
SEK and recall is something like forty 18:41
SEK we can not really find recall / because there are so many expressions in newspaper / and we can not / find all the / &ah paraphrase / in the newspaper 18:51
SEK so it's not easy to find the recall but / is something like / this area 18:55
SEK ok ? 18:57
SEK so / this is one of the method / &ah to find paraphrase 19:01
SEK ok ? 19:02
SEK and [///] ok 19:04
SEK I'm going to [///] yeah I have time {%com: having a look at his watch} 19:06
SEK so I'm going to talk the / another method / of the / paraphrase discovery 19:11
SEK this is called &ah [/] this is / &ah -> / done through relation discovery 19:18
SEK ok ? 19:19
SEK &ah this is slightly different at the beginning but / at the end / we will find the paraphrase 19:24
SEK ok 19:26
SEK motivation is / that / &ah for the relation discovery task // and &ah -> motivation for this / task is / &ah discovering particular relation between named entities / &ah for example between country / and / &ah / between / country name and person name 19:42
SEK there are many / different kinds of relations / like / president relation / and prime minister relation / or / coach of the soccer / of Spain // kind of relation 19:53
SEK and company-company relation there are / much the relation parent / child / relation etcetera 19:59
SEK and we can not / really prepare / how many / relations exist / between these pairs 20:06
SEK and this task is to find / those relations 20:10
SEK ok ? 20:11
SEK we don't know / the relationship / or the relationships in advance 20:15
SEK and we will try to find / as many relationships as possible 20:20
SEK ok ? 20:20
SEK and the basic idea is context based clustering 20:23
SEK and &ah / I &w [/] I will show you 20:27
SEK a much easier way {%com: whispers while looks for the transparency} so this is procedure but / I will describe it on this / example 20:32
SEK &ah for example &ah / in newspapers there are lots of / &ah expressions // like / &ah / is offering to buy / and / instances like &ah Disney / is offering to buy ABC // or this is interest in ABC // this is / negotiating to ABC // to acquire ABC etcetera // because this import / ABC 20:56
SEK and also between IBM and Lotus / there is a expression / similar expression because they bought the / company too 21:03
SEK so we / tag the named entity in the corpus 21:08
SEK and we can / &ah accumulate this / context / between / company-company / relation // or maybe / person-location relation // but xxx [/] for example this case / company-company relation 21:21
SEK and we found / &ah &pat [/] some particular / pair of / companies // we find lots of expressions like this 21:30
SEK so using this / this was as a feature / for clustering 21:36
SEK we can find the cluster / of / acquired relation 21:40
SEK you get idea ? 21:42
SEK ok ? 21:45
SEK so if you [/] if we cluster / based on this word / like buy / acquire / purchase / there's [/] there are lot of / xxx &ah / pair of companies which share this / words 21:56
SEK so / once we cluster them / we have a pair of companies / which / is acquired relations 22:03
SEK ok ? 22:04
SEK and this is the result 22:06
SEK so / this is a result for / person and location / relation 22:11
SEK there are [/] we find / president relation / senator relation / prime minister relation / governor relation / secretary relation / republican relation / and coach relation // from [/] this is &ah New York Times / ninety five 22:23
SEK and &ah in [/] for example in / president relation there are twenty three &e [/] &ah examples were find 22:33
SEK and / seventeen of them / are / really correct / president relations 22:38
SEK so / it was / so so accurate 22:40
SEK and in this cluster / this was / a dominating / &ah president / and president in / small letter 22:48
SEK and / I don't know 22:50
SEK for governor relation we found / sixteen / and fifteen out of sixteen / are correct 22:55
SEK and / there are / words like these 22:58
SEK so / this cluster method find / &ah relationships between named entities 23:03
SEK ok ? 23:04
SEK ok ? 23:07
SEK so this is evaluation results 23:09
SEK cluster [/] so cluster &ten [///] we / evaluate / the clusters which have more than five / &ah membership 23:16
SEK and &ah / accuracy is hundred percent 23:19
SEK and / any pair level [///] so pair level means / accumulate all of these / &ah by / and all of this 23:27
SEK so this is &ah eighty nine percent 23:29
SEK so it's / good accuracy 23:31
SEK it's [/] the cluster &ah [/] the recall is not that high / &ah / but it’s / about sixty percent 23:39
SEK ok ? 23:41
SEK and error [/] we / did error analysis 23:44
SEK and / for example / a expression like / Chechnya war may exhaust / &Presid &ah President Boris Yeltsin 23:51
SEK and / from this sentence / we found the relationship that / &ah Boris Yeltsin is the president of Chechnya // which is not true 23:58
SEK ok ? 23:59
SEK so that's wrong 24:01
SEK and / also / &ah the recall error / we missed / is something like / Boris Yeltsing on the end of fighting in Chechnya 24:10
SEK ok ? 24:11
SEK and / this can not be / acquired / because there's no / so much common words / between these names 24:21
SEK ok ? 24:21
SEK only fighting / and end of fighting is [/] doesn't happen so often in the newspapers / first of all 24:28
SEK and this words are not common 24:30
SEK so [/] so that's why recall is not high 24:33
SEK we have to have / more / expressions 24:36
SEK but [///] ok up to here / this is the relation discovery / method / &ah presented at &ah ACL &nine [/] two thousand four I think 24:47
SEK and then [///] well / once we've found lots of relations / nice relations / maybe that expression / exists in / particular cluster // is talking about the same thing 25:00
SEK you could have a / &ah buying relation / between companies // maybe / we have a [/] lots of / different / kinds of expressions / about &acquisi [/] acquisition of company 25:13
SEK so / but &the [/] of course there are many / &ah noises // even between Disney and / ABC 25:21
SEK there are expressions / there are expressions / xxx acquisition 25:25
SEK so we have to have a filter // to delete the noises 25:29
SEK so we have two filters 25:31
SEK ok ? 25:32
SEK the one condition is that &ah / the expression has to be used in / more than one pair of instances 25:39
SEK so it &ha [/] it can not be particular for one / particular pair of news &ah -> / &ah / pair of names 25:47
SEK for example / Disney and ACB &may [/] maybe there are / very / peculiar expression / between this two 25:53
SEK so we don't want [/] we don't want that 25:57
SEK ok ? 25:58
SEK and also expression has to contain frequent term / like &ah / buy / etcetera 26:03
SEK otherwise we have a lots of [/] lots of noises in different expressions 26:09
SEK so / based on this features we have a / expression like A bought B / A has agreed to buy B / A which is buying B / A's proposed acquisition of B / etcetera 26:20
SEK so these are very nice / &ah -> paraphrase 26:23
SEK ok ? 26:24
SEK so we can use this to cluster the patterns 26:27
SEK ok ? 26:28
SEK and this is another one [///] &o [/] ok // I'm not going to talk this but this is another one 26:34
SEK ok 26:37
SEK so / and &ah [///] ok 26:40
SEK next one / is a named entity 26:42
SEK so &ah / so / usually people use / five or six / named entities like person location organization / or / maybe facilities or / weapons // because we [/] we are working with &ah [/] &ah DARPA 26:59
SEK it's not enough for everything 27:03
SEK you may ask 27:04
SEK so we are creating named entities with / lots of varieties / including two hundred categories right now 27:11
SEK ok ? 27:13
SEK this is based on the / observation of the task / of information extraction / or question answering 27:20
SEK people were / interested in what kind of questions / what kind of types 27:24
SEK and &ah xxx or the capital letter in newspapers 27:29
SEK I got / several thousand of / capital letter words and cluster them // and try to find the categories 27:36
SEK and there's a definition &ah / hundred fifty pages HTML 27:43
SEK you can look at / and &ah that was released 27:46
SEK ok ? 27:48
SEK and / at the moment we have automatic tagger 27:51
SEK this is / rule-based 27:53
SEK yeah / because [///] ok / for eight categories / you can / create a training data / and machine learning can / do this job // but it's not easy for two hundred categories 28:05
SEK so we created dictionaries / and rules like / some capital letter followed [/] followed by &ah / mister / is a person name etcetera 28:15
SEK that's simple 28:16
SEK rules but / this is rule-based 28:18
SEK and we got seventy percent accuracy 28:20
SEK it's not / that high but / for example / &ah tagging product name is quite difficult 28:27
SEK you know xxx &ah anything can be product name 28:30
SEK so / these are difficult things 28:33
SEK yeah people names are ninety percent 28:35
SEK it's just like / other things but still / improving 28:39
SEK ok 28:41
SEK and / right now / I'm trying to &extra [///] this is a strategy of [/] from the / information extraction xxx ODIE 28:49
SEK but / at the moment &ah [///] = ok / we / have this two hundred categories / &ah &m [/] &ex [/] examples &ah / on the location / there is &ah / continental / or / &ah / domestic region / etcetera // and / among GPE / for example &ah / geological and / political entity / like / country / city / province / etcetera 29:15
SEK and these are [/] these are nice names // categories of names 29:19
SEK but / &ah usually / proper names / has [/] has / expressed by / some symbol / like Satoshi Sekine is me // but I have lots of attributes 29:32
SEK I'm [/] I'm Japanese // nationality Japanese 29:35
SEK I'm tall as this 29:37
SEK and &ah / something else 29:40
SEK and this is very important for names / because name [///] the suffix [/] the string of the name is / whatever it is / Satoshi Sekine / hhh {%act: spells his name} / that's / symbol 29:51
SEK but my / property is my / attribute // expressed [/] can be expressed by attributes 29:57
SEK so I thought that it's very interesting to / categorize &ah or / create attributes for / each named entity 30:05
SEK and this is example / for the person attributes 30:10
SEK so person / should have / vocation / nationality / career / masterpiece / &ah graduate from these schools / hometown / etcetera 30:21
SEK and once / we have this kind of attributes and / that can be used for information extraction / or question-answering / etcetera 30:30
SEK for example / question-answering / when / we look at questions / in question-answering task // and people sometimes ask the attribute of the names / like / what is the height of Mount Fuji ? 30:41
SEK then / if the / information is structured like this / then / &thus [/] this question-answering can be answered / by / just a / SQL / type of question // &ah manipulation 30:54
SEK or maybe / some people ask that / what is the [/] what is the highest mountain in Japan ? 31:01
SEK then we [/] we have these data / then we can answer / by / a database manipulation 31:07
SEK so / this is the way to / &ah structure the information in the world / I believe 31:13
SEK so I'm working on this 31:15
SEK &ah / ok 31:17
SEK anyway / so / &ah / briefly I explain / what the pattern discovery is / and what the paraphrase discovery is and / named entities 31:28
SEK and we make / this system works / ok ? 31:32
SEK &ah / ok / that's it 31:34
SEK so / I can show you [/] I can / accept any / question if you have 31:41
SEK and / let's see 31:46
SEK I will show you 31:48
SEK I want to prove that this is not fake 31:50
SEK this is / real / demo 31:52
SEK ok ? 31:56
SEK this is sentence 31:57
SEK and / I did the evaluation 32:00
SEK so / &ah it [/] this is [/] this can be done by [/] only by subjective evaluation 32:06
SEK it's not to say / this accurate or not 32:09
SEK so / I asked several people / to / look at the table / and pick up / twenty topics 32:15
SEK &ah / I showed in the / demo 32:17
SEK there is a ACE topic 32:18
SEK and twenty of them happens [/] appears a lot in the newspaper / ok ? 32:23
SEK and / we learned it and / created a table / and asked them if that is / very useful / or useful / or not useful / for the further search 32:32
SEK yeah / I'm not [/] I'm not trying to convince / this / as a final result 32:37
SEK this is only the &m [/] intermediate step / to / get the real information 32:42
SEK so this [/] if this help / for you / to search more information / I'm happy / ok ? 32:49
SEK so / I asked them if / actually / this table / is final result [/] useful as a final result = I'm sorry / this has a typo {%com: referring to the transparency} // &ah / very [/] &ah [///] ok 33:01
SEK I asked them / to say / this is very useful 33:04
SEK and if / they found this is / good [/] good / table / in order to do more search / and / they judge as useful 33:14
SEK and / if the table is xxx then / they said / this is not useful / ok ? 33:20
SEK out of twenty / two are very useful / twelve are useful / and six are not useful 33:25
SEK so / this is reasonable / result I believe / ok ? 33:30
SEK and &ah / and also / correctness of the table fillers 33:36
SEK that's can be / evaluated easily 33:38
SEK so / pick up hundred random / rows of the table // and how accurate they are 33:44
SEK and / &ah -> / out of hundred / eighty four / correct / and four partially correct / and twelve incorrect 33:52
SEK so it's [///] there are some mistakes there // too 33:55
SEK &ah sometimes it's / the event / which is not related to 34:00
SEK for example Nobel Prize / like xxx on that is incorrect 34:03
SEK and also there are / incorrect / based &o [/] &ah because of the named entities 34:08
SEK named entity tagger made / lots of / errors so / it's [/] that is true 34:13
SEK so / maybe &i [/] in the person / look / a person xxx / maybe the company / appears on the / person xxx that's incorrect / ok ? 34:23
SEK so / that is it 34:24
SEK but &ah there are many ways to improve this 34:27
SEK &ah for example / one of them is / this / label / ok ? 34:32
SEK I just say / company / or money / or date 34:35
SEK this is category name for named entities 34:37
SEK but there is a [/] there are xxx for / the event / for / for example this company / acquired the company / &acqui [/] &acqui [/] acquiring company and acquired the company 34:48
SEK or this is amount of money / of the deal 34:52
SEK or this is / something else 34:54
SEK so / I want to put some / meaningful label for this 34:58
SEK and this is what / I'm working on right now 35:01
SEK and &ah -> / also [///] other things I'm working on {%com: whisper} 35:07
SEK I [/] I try to speed up this / &ah / one &mi [/] one second [/] one / minute is still / slow 35:13
SEK so I have the idea to / at this double the speed 35:17
SEK and &ah / yeah 35:20
SEK also = yeah 35:21
SEK the one of the things / I'm very interested in is / improving named entity accuracy 35:26
SEK it's still seventy percent 35:27
SEK so it's not good enough 35:29
SEK so / there are several research I'm going on [/] I'm / working on / to improve the named entities 35:35
SEK ok 35:38
SEK so / I described / these three / IE pattern discovery / paraphrase discovery / relation discovery / in the [/] on-demand IE / and named entity / and attribute for a named entity // to describe all [/] all of these / ok ? 35:54
SEK {%com: looking at his watch} it's about [/] well / I &do [/] I don't know / one hour 35:57
SEK I / < can keep / talking > 36:00
ANT [<] < &ye [/] yes > 36:00
SEK ¬ something else too 36:01
ANT ok 36:03
SEK ok 36:03
SEK maybe I can stop here and / get a question on / on demand information extraction 36:08