Convenciones de transcripción

@Title: On-demand Information Extraction
@File: mavir05.txt
@Participants: SEK, Satoshi, (man, C, 3, research associate professor, lecturer, Japan)
		ANT, Antonio, (man, C, 3, associate professor, lecturer, Madrid)	
@Date: 15/11/2007
@Place: Madrid
@Situation: Conference (II Jornadas MAVIR), conference room at university, not hidden, researcher as observer
@Topic: Language Technologies
@Source: MAVIR 
@Class: formal in natural context, conference, monologue
@Length: 36' 08''
@Words: 4461
@Acoustic_quality: A
@Transcriber: Marta Garrote
@Revisor: Leonardo Campillos, Manuel Alcántara
@Comments:

ANT thank you to come here  00:00

SEK thank you 00:01

SEK thank you 00:02

SEK &ah / buenos días // and / gracias 00:06

SEK that's / that's two words I learnt 00:08

SEK ok // and now I have to speak in English 00:11

SEK &ah / ok 00:13

SEK so today / &ah / I'm talking about on-demand information extraction 00:17

SEK that is my / main research 00:19

SEK ok / I'm working on / named entity or paraphrase / etcetera // but these are all components for this task // for this system 00:26

SEK and / I presented this system at &ah ACL // as a demonstration and // I believe some Spanish guy came to me / and / give some good questions and // I don't know one of you / maybe / the person or not 00:41

SEK maybe not 00:42

SEK so / maybe this is new for / everybody 00:46

SEK so that's nice 00:47

SEK ok / so / xxx / he gave me / very good introduction // but a [/] xxx [/] at [/] a little bit more 00:54

SEK &ah / recently I'm (happy) working on / &ah / organizing a workshop on textual entailment and paraphrasing 01:02

SEK so that's &ah one of the area I'm working on // and named &en [/] named entity recognition 01:08

SEK &ah -> / this is a journal / of [/] French journal / I think 01:13

SEK I [/] I don't know 01:14

SEK &ah so / this is my area too 01:17

SEK and also / &ah / I was organizing web people search task 01:21

SEK this is &ah / joined work with Javier / and Julio // there 01:25

SEK and this is this / task to search the people name // which maybe ambiguous 01:30

SEK Satoshi Sekine exits / maybe five different people 01:33

SEK and we want to distinguish it // so that's another task 01:37

SEK hhh {%act: interjection} 01:39

SEK ok these are the areas // and actually / I'm happy working on / last five years / on / these topics // ok ? 01:47

SEK on-demanding information extraction at the top // this is the main area / I'm / talking today 01:51

SEK but also I work on summarization / IR and QA // and xxx English analyzer / named entity / and a Query log and web people search in Japanese spelling / etcetera 02:04

SEK but / if you are interested in / some thing other than on-demand IE / please come to me after this talk and / ask me some questions 02:11

SEK ok ? 02:12

SEK ok / so / on-demanding information extraction 02:17

SEK first I will / describe what is information extraction 02:21

SEK maybe some of you / may know / that / &ah information extraction is a task to automatically extract information / on specific / scenario / from unstructured texts like newspaper 02:33

SEK and / put it / into table format 02:37

SEK so / for example / if you are interested in management succession // then / you got the newspaper which / contents lots of management succession events 02:47

SEK and / put / this into a table like / this date / this person / &ah get that [/] this / companies / this position 02:56

SEK ok ? 02:56

SEK that [/] this is much easier to see 02:59

SEK the information is structured // and / John Smith xxx company's this position // instead of reading all the news 03:09

SEK ok ? 03:09

SEK so this is called / information extraction 03:12

SEK but / there are a lot of problems 03:16

SEK the main problem / or one of the main problems / &ah / is [/] is &ah -> / &pre [/] preparation of knowledge // for a given scenario 03:26

SEK for example / once / we have a task like management succession // we have to create lots of / knowledge / about / this task / like / we have to create pattern // like / &c [/] &ah -> / company announced person's promotion to position 03:42

SEK so this is a pattern 03:43

SEK and this / tells / the management succession event 03:46

SEK and this is the company / &ah / he is going to be / in // and this is the position / and this is the person name 03:54

SEK so we have to / prepare this kind of knowledge 03:56

SEK you can imagine that / this is not only the pattern 03:59

SEK there are lots of lots of patterns / which / &ah express the management succession event 04:05

SEK so this is very laborious 04:08

SEK &ah / people has been using / &ah creating this by hand / in nineteen eighties // &ah when MUC / preparation 04:15

SEK this is one of the biggest / &ah information extraction // &ah -> preparation 04:21

SEK and people use / &ah / create by hand // which is [/] takes lot of time 04:26

SEK or create training data // and / use machine learning // to learn this patterns 04:32

SEK but that's also / time-consuming because / you have to create lots of training data 04:37

SEK so / at the MUC / Message Understanding Conference / &ah we have one month / to create this knowledge 04:46

SEK so / the organizer tell me that / task for this year 04:50

SEK this year we are interested in management succession 04:52

SEK this year we are interested in / &ah disease outbreak / etcetera 04:57

SEK then one month we create the knowledge / by hand or / using the training data 05:02

SEK then we / &a [/] after one month / we / evaluate the system 05:06

SEK so it's very very / limited 05:09

SEK &ah / so once / if you want to move to another scenario / you have to spend / another one month / to create the system 05:17

SEK all this is / bad 05:20

SEK so / my / goal / is / make this one month / into one / minute 05:27

SEK ok ? 05:29

SEK automatic 05:30

SEK &ah / in other words / creating this pattern / should be done / automatically 05:35

SEK ok ? 05:36

SEK and how ? 05:37

SEK so I'm going to tell you how / I did this 05:40

SEK &ah / I used unsupervised learning methods // &ah &pat [/] pattern discovery and paraphrase discovery // I'm going to tell 05:48

SEK and also I prepare as much knowledge as possible / for as many scenario as possible 05:54

SEK so this is for / extended named entity 05:58

SEK &ah some of / you may know I have / two hundred category named entities // and not only people location organization but / this is name / or position name etcetera 06:08

SEK ok 06:09

SEK and &ah / you / &ah [/] I can connect to the wire 06:13

SEK this says < ok ? > 06:14

ANT [<] < yes > 06:14

ANT / yes 06:15

SEK xxx very fast 06:18

SEK so at the beginning / I will show you / is there 06:21

SEK ok ? 06:22

SEK and &ah = takes time 06:27

SEK ok ? 06:29

SEK so this example 06:30

SEK I know / this / can create us [/] create a table 06:34

SEK so this is input 06:35

SEK acquire / acquisition xxx 06:38

SEK so this is talking about company's / acquisition of another company 06:42

SEK and / it will take one minute 06:45

SEK please wait with me 06:46

SEK I believe it's one minute 06:48

SEK ok 06:50

SEK and / it going to / create a table 06:53

SEK so I take this [/] this examples / &ah / topics on / ACE evaluation 06:59

SEK this is another evaluation on / information extraction / happening in United States / this days 07:05

SEK and / they have &ah / twenty or thirty / &ah topics like these 07:10

SEK and we tried = xxx at the end I can show you the evaluation results / but this &o [/] ok ? 07:17

SEK so this is a table 07:18

SEK so / this is company / company / company / &ah sometimes doesn't have company but date / and money 07:26

SEK and you can see this / example 07:29

SEK sentence says &ah / hhh {%act: blablabla imitating talk} which / this company acquired a [/] as part of its xxx / &ah two points three billion projects of / this company in nineteen ninety five 07:41

SEK so this is / exactly what I want 07:45

SEK ok ? 07:45

SEK and -> / I didn't [/] I didn't do any magic / behind this 07:51

SEK this is really true and -> you can &su [/] see these / tables / &ah sometimes it's wrong / yeah 07:59

SEK there are mistakes / of course 08:00

SEK but it's creates these / tables 08:04

SEK ok ? 08:05

SEK and / at the end / I [/] I [/] I will / ask you to give me / some task // anything 08:10

SEK I can [/] as long as I can type in here / I want to try 08:15

SEK and / at the ACL this Spanish guy came to me / and he // I don't know // didn't know what's / information extraction 08:23

SEK and he asked me the question like Spain // type Spain 08:28

SEK maybe I can try that {%com: he types and whispers} 08:33

SEK can you guess what kind of table will create it [/] will be created // about Spain ? 08:40

SEK yeah 08:41

SEK I was afraid 08:42

SEK ok / this I never tried this kind of question 08:44

SEK &ah the demos / and he typed xxx 08:48

SEK I looked at the [/] behind 08:52

SEK ok 08:53

SEK takes [/] take one minute 08:54

SEK {%com: whispers} yeah / I have idea to / make it [/] make this at least / thirty seconds / but / at the moment ... 09:03

SEK ok 09:04

SEK so this is the result 09:05

SEK you can tell 09:07

SEK right ? 09:08

SEK this is a result xxx was supposed to get 09:12

SEK I can pick one of them 09:14

SEK maybe this one 09:15

SEK {%com: he waits until the page loads} Netherlands beats Spain 09:25

SEK hhh {%act: interjection} beat hhh {%act: laugh} I didn't know 09:28

SEK you know what I'm forward to xxx 09:30

SEK so / yeah ? 09:31

SEK this is what / maybe / &ah we can expect from / a question like country name 09:36

SEK so this is a lot [/] this / kind of events happen in the newspaper a lot of times 09:41

SEK so this has be 09:42

SEK so this is information extraction task 09:44

SEK so / &ah xxx / the event has to happen / repeatedly / in / newspaper 09:51

SEK that is the task for information extraction 09:53

SEK so / we can not just [/] somebody asks / Spanish wine 09:57

SEK but it's never happening in newspaper so often // so it's / not the task for the / information extraction 10:02

SEK so / please / think / about the task 10:05

SEK and I can / ask at the end 10:07

SEK ok ? 10:08

SEK ok / so / then / now / I'm going to tell / what's going on behind this 10:16

SEK ok ? 10:18

SEK so this is a overview of the / process 10:22

SEK so / I got the description of the task like the ones / I show you // Spain etcetera // and information retrieval system just run 10:31

SEK this is / very simple xxx base / information retrieval system 10:34

SEK hhh {%act: interjection} ok ? 10:37

SEK and / I got several document 10:39

SEK I think I / got / one thousand documents 10:41

SEK I have a threshold 10:43

SEK and then from this document I got patterns // like the one I showed you / like a person's promotion to location / &ah promotion {%alt: promo-tion} to xxx a position 10:53

SEK then / I have lots of patterns 10:56

SEK ok / one certain / pattern scored by the relevancy 11:00

SEK and [///] but / if I have a one thousand pattern / so I can create one thousand different tables / but it's [/] it's not what we want 11:10

SEK we don't want to look at all -> one thousand tables 11:13

SEK so what I have to do is / connect these patterns // semantically 11:17

SEK if / these two patterns are talking about the same thing / we have to cluster at them 11:22

SEK so that we have a less number of tables 11:25

SEK ok ? 11:27

SEK so / I'm doing these / which are using paraphrase discovery method 11:31

SEK this is &al [/] also / &ah online / &ah -> on-demand // depending on what you ask 11:38

SEK ok 11:40

SEK and then we have a pattern set 11:42

SEK ok ? 11:42

SEK we have ten set of patterns / then create a table from the [/] from the / corpus // newspaper 11:49

SEK and in order to do this / I have to have a language analyzer / of course / and a &na [/] extended named entity tagger // and co-reference 11:57

SEK &ah -> but the new thing on this system / is / pattern discovery / paraphrase discovery and named entity tagger 12:04

SEK so I'm going to describe this three components / in detail 12:07

SEK ok / the first one is / &ah pattern discovery 12:12

SEK &eh actually this / idea is very simple 12:16

SEK ok ? 12:17

SEK &ah / I [/] again / I want to find / the patterns / like companies announced &p [/] person's promotion to position 12:24

SEK and the key idea is / ok / I use / information retrieval // and get the / documents 12:30

SEK and the patterns which appear in this document very often / compared to background // maybe important pattern for this / domain 12:40

SEK so this is / very / like a xxx idea [/] idea 12:44

SEK ok ? 12:45

SEK &ah -> / so / I got &t [/] for &pos [/] for example Spain 12:51

SEK I got a lot of documents / containing Spain 12:54

SEK and / such / the patterns / which connect to Spain / in this document // &ah -> which appears a lot of times in this document / and not in the / background / the &who [/] entire document 13:07

SEK that's maybe the xxx / patterns which is important from Spain 13:11

SEK that's how I got the -> [/] some country beats / of Spain / etcetera 13:17

SEK ok ? 13:18

SEK &ah -> but / the problem is / what is pattern ? 13:22

SEK what is the extract format of the pattern ? 13:25

SEK so we can think / lot of different things // like &ah -> predicate argument structure 13:30

SEK so / maybe / argument is very important for events 13:34

SEK so / between predicate and argument / can be a pattern 13:38

SEK or / we parse / sentences // and the change from the top node to the end // maybe this can be a pattern too 13:47

SEK or / any kind of subtree in the parsetree 13:51

SEK if you know parsetree 13:52

SEK ok 13:53

SEK &ah -> / that can be a pattern too 13:56

SEK and &ah / from here to / &s [/] &ah from / predicate argument / to subtree / is / more general / but it's / computationally very expensive 14:05

SEK if you have one thousand / sentences parsed // and / any part of the subtree // maybe / millions or / much more 14:16

SEK so this is computational / &ah very expensive // and [///] but / &ah luckily we have [///] there is a algorithm / tree mining algorithm / which &co [/] really count these subtrees // important subtrees 14:29

SEK &ah -> xxx somebody / &ah in the / &ah machine learning area 14:34

SEK so &ah we use these algorithms // and at the end / near human performance was achieved 14:39

SEK ok // &ah -> / and / this is / &ah evaluation result using the -> [/] some / domain / &ah -> succession domain 14:50

SEK ok ? 14:51

SEK and &ah / subtree is expensive and it takes / lot of time but it's / achieved &ah best / &becou [///] &ah this is precision / recall // and if you go / this direction / it is best 15:04

SEK if this is here {%com: he is pointing the screen} / ten [/] hundred percent precision / eighty percent recall 15:08

SEK and the human performance is somewhere here 15:10

SEK it needs ninety / and sixty / or something this 15:15

SEK so / this subtree method is / quite good // compared to human 15:20

SEK so it finds a lot of nice patterns // based on this idea 15:25

SEK ok? 15:26

SEK &ah that's a [///] yeah / it's not detailed but / it's a very high &ah description of the pattern discovery 15:34

SEK ok ? 15:36

SEK by this method we find / lots of patterns 15:41

SEK but as I said / there are so many patterns // and some of the patterns are talking about the same thing 15:47

SEK is [/] even if it / looks different // like &ah / I don't know 15:51

SEK &ah -> Netherlands beats Spain 15:53

SEK or / Spain was -> / I don't know / xxx [/] beaten by [/] of course beaten by Netherlanders too // but also another expressions too 16:03

SEK so we have to find the relationship between / information extraction patterns 16:08

SEK ok ? 16:09

SEK otherwise / these expressions create different tables 16:13

SEK so / &ah one of the [///] &ah I have &ah three methods / of paraphrase discovery 16:19

SEK and this is only one of the methods 16:21

SEK {%com: drinks water} I think we trust / which is / most interesting one 16:29

SEK and the key idea for this method is / the / xxx 16:34

SEK ok ? 16:35

SEK events / are usually reported in different newspapers on the same day 16:39

SEK if you have &ah two newspapers / New York Times and the Washington Post // they are talking about the same thing if there is a big event 16:47

SEK I don't know / Netherlands beats Spain 16:49

SEK that's a big event / ok ? 16:51

SEK and &ah [///] so we can find this / expression about / Netherlands beats Spain 16:58

SEK &ah but / we have to find where it is 17:02

SEK and the key / is the named entity 17:05

SEK so / Spain = whatever that expression is / Netherlands / and Spain / or two to three // this expression must be the same 17:13

SEK so use these named &en [/] named entities / as anchor / we can find / paraphrase 17:21

SEK that's the idea 17:22

SEK ok ? 17:24

SEK and &ah we observed encouraging results 17:26

SEK ok ? 17:27

SEK so this is &a [/] again / this is the procedure 17:30

SEK we have two newspapers / Washington Post and New York Times maybe // and / xxx fast find a comparable article / talking about the soccer result of Spain 17:41

SEK ok ? 17:42

SEK then we named &en [/] we tag the named entity / among these / xxx 17:48

SEK so / &sa [/] &ah Spain and Netherlands etcetera 17:52

SEK ok ? 17:54

SEK then we / parse them // of course // and &ah we find the chunk / which is talking about this event 18:01

SEK so Netherlands beat / Spain // or Spain's [/] Spain's / beat by / &ah Netherlands etcetera 18:09

SEK ok ? 18:10

SEK then extract / this paraphrase 18:13

SEK ok ? 18:13

SEK this / is the method 18:15

SEK ok ? 18:17

SEK and this was / presented at some / conference two thousand two or three / etcetera 18:22

SEK and &ah this is the result / one of the results / based on Japanese newspaper 18:27

SEK &ah on / special / event like murder suspect 18:31

SEK ok ? 18:33

SEK and &ah / accuracy is [/] precision is something like / fifty / &sixt [/] sixty two 18:38

SEK and recall is something like forty 18:41

SEK we can not really find recall / because there are so many expressions in newspaper / and we can not / find all the / &ah paraphrase / in the newspaper 18:51

SEK so it's not easy to find the recall but / is something like / this area 18:55

SEK ok ? 18:57

SEK so / this is one of the method / &ah to find paraphrase 19:01

SEK ok ? 19:02

SEK and [///] ok 19:04

SEK I'm going to [///] yeah I have time {%com: having a look at his watch} 19:06

SEK so I'm going to talk the / another method / of the / paraphrase discovery 19:11

SEK this is called &ah [/] this is / &ah -> / done through relation discovery 19:18

SEK ok ? 19:19

SEK &ah this is slightly different at the beginning but / at the end / we will find the paraphrase 19:24

SEK ok 19:26

SEK motivation is / that / &ah for the relation discovery task // and &ah -> motivation for this / task is / &ah discovering particular relation between named entities / &ah for example between country / and / &ah / between / country name and person name 19:42

SEK there are many / different kinds of relations / like / president relation / and prime minister relation / or / coach of the soccer / of Spain // kind of relation 19:53

SEK and company-company relation there are / much the relation parent / child / relation etcetera 19:59

SEK and we can not / really prepare / how many / relations exist / between these pairs 20:06

SEK and this task is to find / those relations 20:10

SEK ok ? 20:11

SEK we don't know / the relationship / or the relationships in advance 20:15

SEK and we will try to find / as many relationships as possible 20:20

SEK ok ? 20:20

SEK and the basic idea is context based clustering 20:23

SEK and &ah / I &w [/] I will show you 20:27

SEK a much easier way {%com: whispers while looks for the transparency} so this is procedure but / I will describe it on this / example 20:32

SEK &ah for example &ah / in newspapers there are lots of / &ah expressions // like / &ah / is offering to buy / and / instances like &ah Disney / is offering to buy ABC // or this is interest in ABC // this is / negotiating to ABC // to acquire ABC etcetera // because this import / ABC 20:56

SEK and also between IBM and Lotus / there is a expression / similar expression because they bought the / company too 21:03

SEK so we / tag the named entity in the corpus 21:08

SEK and we can / &ah accumulate this / context / between / company-company / relation // or maybe / person-location relation // but xxx [/] for example this case / company-company relation 21:21

SEK and we found / &ah &pat [/] some particular / pair of / companies // we find lots of expressions like this 21:30

SEK so using this / this was as a feature / for clustering 21:36

SEK we can find the cluster / of / acquired relation 21:40

SEK you get idea ? 21:42

SEK ok ? 21:45

SEK so if you [/] if we cluster / based on this word / like buy / acquire / purchase / there's [/] there are lot of / xxx &ah / pair of companies which share this / words 21:56

SEK so / once we cluster them / we have a pair of companies / which / is acquired relations 22:03

SEK ok ? 22:04

SEK and this is the result 22:06

SEK so / this is a result for / person and location / relation 22:11

SEK there are [/] we find / president relation / senator relation / prime minister relation / governor relation / secretary relation / republican relation / and coach relation // from [/] this is &ah New York Times / ninety five 22:23

SEK and &ah in [/] for example in / president relation there are twenty three &e [/] &ah examples were find 22:33

SEK and / seventeen of them / are / really correct / president relations 22:38

SEK so / it was / so so accurate 22:40

SEK and in this cluster / this was / a dominating / &ah president / and president in / small letter 22:48

SEK and / I don't know 22:50

SEK for governor relation we found / sixteen / and fifteen out of sixteen / are correct 22:55

SEK and / there are / words like these 22:58

SEK so / this cluster method find / &ah relationships between named entities 23:03

SEK ok ? 23:04

SEK ok ? 23:07

SEK so this is evaluation results 23:09

SEK cluster [/] so cluster &ten [///] we / evaluate / the clusters which have more than five / &ah membership 23:16

SEK and &ah / accuracy is hundred percent 23:19

SEK and / any pair level [///] so pair level means / accumulate all of these / &ah by / and all of this 23:27

SEK so this is &ah eighty nine percent 23:29

SEK so it's / good accuracy 23:31

SEK it's [/] the cluster &ah [/] the recall is not that high / &ah / but it’s / about sixty percent 23:39

SEK ok ? 23:41

SEK and error [/] we / did error analysis 23:44

SEK and / for example / a expression like / Chechnya war may exhaust / &Presid &ah President Boris Yeltsin 23:51

SEK and / from this sentence / we found the relationship that / &ah Boris Yeltsin is the president of Chechnya // which is not true 23:58

SEK ok ? 23:59

SEK so that's wrong 24:01

SEK and / also / &ah the recall error / we missed / is something like / Boris Yeltsing on the end of fighting in Chechnya 24:10

SEK ok ? 24:11

SEK and / this can not be / acquired / because there's no / so much common words / between these names 24:21

SEK ok ? 24:21

SEK only fighting / and end of fighting is [/] doesn't happen so often in the newspapers / first of all 24:28

SEK and this words are not common 24:30

SEK so [/] so that's why recall is not high 24:33

SEK we have to have / more / expressions 24:36

SEK but [///] ok up to here / this is the relation discovery / method / &ah presented at &ah ACL &nine [/] two thousand four I think 24:47

SEK and then [///] well / once we've found lots of relations / nice relations / maybe that expression / exists in / particular cluster // is talking about the same thing 25:00

SEK you could have a / &ah buying relation / between companies // maybe / we have a [/] lots of / different / kinds of expressions / about &acquisi [/] acquisition of company 25:13

SEK so / but &the [/] of course there are many / &ah noises // even between Disney and / ABC 25:21

SEK there are expressions / there are expressions / xxx acquisition 25:25

SEK so we have to have a filter // to delete the noises 25:29

SEK so we have two filters 25:31

SEK ok ? 25:32

SEK the one condition is that &ah / the expression has to be used in / more than one pair of instances 25:39

SEK so it &ha [/] it can not be particular for one / particular pair of news &ah -> / &ah / pair of names 25:47

SEK for example / Disney and ACB &may [/] maybe there are / very / peculiar expression / between this two 25:53

SEK so we don't want [/] we don't want that 25:57

SEK ok ? 25:58

SEK and also expression has to contain frequent term / like &ah / buy / etcetera 26:03

SEK otherwise we have a lots of [/] lots of noises in different expressions 26:09

SEK so / based on this features we have a / expression like A bought B / A has agreed to buy B / A which is buying B / A's proposed acquisition of B / etcetera 26:20

SEK so these are very nice / &ah -> paraphrase 26:23

SEK ok ? 26:24

SEK so we can use this to cluster the patterns 26:27

SEK ok ? 26:28

SEK and this is another one [///] &o [/] ok // I'm not going to talk this but this is another one 26:34

SEK ok 26:37

SEK so / and &ah [///] ok 26:40

SEK next one / is a named entity 26:42

SEK so &ah / so / usually people use / five or six / named entities like person location organization / or / maybe facilities or / weapons // because we [/] we are working with &ah [/] &ah DARPA 26:59

SEK it's not enough for everything 27:03

SEK you may ask 27:04

SEK so we are creating named entities with / lots of varieties / including two hundred categories right now 27:11

SEK ok ? 27:13

SEK this is based on the / observation of the task / of information extraction / or question answering 27:20

SEK people were / interested in what kind of questions / what kind of types 27:24

SEK and &ah xxx or the capital letter in newspapers 27:29

SEK I got / several thousand of / capital letter words and cluster them // and try to find the categories 27:36

SEK and there's a definition &ah / hundred fifty pages HTML 27:43

SEK you can look at / and &ah that was released 27:46

SEK ok ? 27:48

SEK and / at the moment we have automatic tagger 27:51

SEK this is / rule-based 27:53

SEK yeah / because [///] ok / for eight categories / you can / create a training data / and machine learning can / do this job // but it's not easy for two hundred categories 28:05

SEK so we created dictionaries / and rules like / some capital letter followed [/] followed by &ah / mister / is a person name etcetera 28:15

SEK that's simple 28:16

SEK rules but / this is rule-based 28:18

SEK and we got seventy percent accuracy 28:20

SEK it's not / that high but / for example / &ah tagging product name is quite difficult 28:27

SEK you know xxx &ah anything can be product name 28:30

SEK so / these are difficult things 28:33

SEK yeah people names are ninety percent 28:35

SEK it's just like / other things but still / improving 28:39

SEK ok 28:41

SEK and / right now / I'm trying to &extra [///] this is a strategy of [/] from the / information extraction xxx ODIE 28:49

SEK but / at the moment &ah [///] = ok / we / have this two hundred categories / &ah &m [/] &ex [/] examples &ah / on the location / there is &ah / continental / or / &ah / domestic region / etcetera // and / among GPE / for example &ah / geological and / political entity / like / country / city / province / etcetera 29:15

SEK and these are [/] these are nice names // categories of names 29:19

SEK but / &ah usually / proper names / has [/] has / expressed by / some symbol / like Satoshi Sekine is me // but I have lots of attributes 29:32

SEK I'm [/] I'm Japanese // nationality Japanese 29:35

SEK I'm tall as this 29:37

SEK and &ah / something else 29:40

SEK and this is very important for names / because name [///] the suffix [/] the string of the name is / whatever it is / Satoshi Sekine / hhh {%act: spells his name} / that's / symbol 29:51

SEK but my / property is my / attribute // expressed [/] can be expressed by attributes 29:57

SEK so I thought that it's very interesting to / categorize &ah or / create attributes for / each named entity 30:05

SEK and this is example / for the person attributes 30:10

SEK so person / should have / vocation / nationality / career / masterpiece / &ah graduate from these schools / hometown / etcetera 30:21

SEK and once / we have this kind of attributes and / that can be used for information extraction / or question-answering / etcetera 30:30

SEK for example / question-answering / when / we look at questions / in question-answering task // and people sometimes ask the attribute of the names / like / what is the height of Mount Fuji ? 30:41

SEK then / if the / information is structured like this / then / &thus [/] this question-answering can be answered / by / just a / SQL / type of question // &ah manipulation 30:54

SEK or maybe / some people ask that / what is the [/] what is the highest mountain in Japan ? 31:01

SEK then we [/] we have these data / then we can answer / by / a database manipulation 31:07

SEK so / this is the way to / &ah structure the information in the world / I believe 31:13

SEK so I'm working on this 31:15

SEK &ah / ok 31:17

SEK anyway / so / &ah / briefly I explain / what the pattern discovery is / and what the paraphrase discovery is and / named entities 31:28

SEK and we make / this system works / ok ? 31:32

SEK &ah / ok / that's it 31:34

SEK so / I can show you [/] I can / accept any / question if you have 31:41

SEK and / let's see 31:46

SEK I will show you 31:48

SEK I want to prove that this is not fake 31:50

SEK this is / real / demo 31:52

SEK ok ? 31:56

SEK this is sentence 31:57

SEK and / I did the evaluation 32:00

SEK so / &ah it [/] this is [/] this can be done by [/] only by subjective evaluation 32:06

SEK it's not to say / this accurate or not 32:09

SEK so / I asked several people / to / look at the table / and pick up / twenty topics 32:15

SEK &ah / I showed in the / demo 32:17

SEK there is a ACE topic 32:18

SEK and twenty of them happens [/] appears a lot in the newspaper / ok ? 32:23

SEK and / we learned it and / created a table / and asked them if that is / very useful / or useful / or not useful / for the further search 32:32

SEK yeah / I'm not [/] I'm not trying to convince / this / as a final result 32:37

SEK this is only the &m [/] intermediate step / to / get the real information 32:42

SEK so this [/] if this help / for you / to search more information / I'm happy / ok ? 32:49

SEK so / I asked them if / actually / this table / is final result [/] useful as a final result = I'm sorry / this has a typo {%com: referring to the transparency} // &ah / very [/] &ah [///] ok 33:01

SEK I asked them / to say / this is very useful 33:04

SEK and if / they found this is / good [/] good / table / in order to do more search / and / they judge as useful 33:14

SEK and / if the table is xxx then / they said / this is not useful / ok ? 33:20

SEK out of twenty / two are very useful / twelve are useful / and six are not useful 33:25

SEK so / this is reasonable / result I believe / ok ? 33:30

SEK and &ah / and also / correctness of the table fillers 33:36

SEK that's can be / evaluated easily 33:38

SEK so / pick up hundred random / rows of the table // and how accurate they are 33:44

SEK and / &ah -> / out of hundred / eighty four / correct / and four partially correct / and twelve incorrect 33:52

SEK so it's [///] there are some mistakes there // too 33:55

SEK &ah sometimes it's / the event / which is not related to 34:00

SEK for example Nobel Prize / like xxx on that is incorrect 34:03

SEK and also there are / incorrect / based &o [/] &ah because of the named entities 34:08

SEK named entity tagger made / lots of / errors so / it's [/] that is true 34:13

SEK so / maybe &i [/] in the person / look / a person xxx / maybe the company / appears on the / person xxx that's incorrect / ok ? 34:23

SEK so / that is it 34:24

SEK but &ah there are many ways to improve this 34:27

SEK &ah for example / one of them is / this / label / ok ? 34:32

SEK I just say / company / or money / or date 34:35

SEK this is category name for named entities 34:37

SEK but there is a [/] there are xxx for / the event / for / for example this company / acquired the company / &acqui [/] &acqui [/] acquiring company and acquired the company 34:48

SEK or this is amount of money / of the deal 34:52

SEK or this is / something else 34:54

SEK so / I want to put some / meaningful label for this 34:58

SEK and this is what / I'm working on right now 35:01

SEK and &ah -> / also [///] other things I'm working on {%com: whisper} 35:07

SEK I [/] I try to speed up this / &ah / one &mi [/] one second [/] one / minute is still / slow 35:13

SEK so I have the idea to / at this double the speed 35:17

SEK and &ah / yeah 35:20

SEK also = yeah 35:21

SEK the one of the things / I'm very interested in is / improving named entity accuracy 35:26

SEK it's still seventy percent 35:27

SEK so it's not good enough 35:29

SEK so / there are several research I'm going on [/] I'm / working on / to improve the named entities 35:35

SEK ok 35:38

SEK so / I described / these three / IE pattern discovery / paraphrase discovery / relation discovery / in the [/] on-demand IE / and named entity / and attribute for a named entity // to describe all [/] all of these / ok ? 35:54

SEK {%com: looking at his watch} it's about [/] well / I &do [/] I don't know / one hour 35:57

SEK I / < can keep / talking > 36:00

ANT [<] < &ye [/] yes > 36:00

SEK ¬ something else too 36:01

ANT ok 36:03

SEK ok 36:03

SEK maybe I can stop here and / get a question on / on demand information extraction 36:08