<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Reports of Our Professional Deaths Have Been Greatly Exaggerated, Part 2</title>
	<atom:link href="http://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/feed/" rel="self" type="application/rss+xml" />
	<link>http://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/</link>
	<description>the neverending reference interview of life</description>
	<lastBuildDate>Fri, 24 May 2013 07:03:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Tweets that mention The Reports of Our Professional Deaths Have Been Greatly Exaggerated, Part 2 « Agnostic, Maybe -- Topsy.com</title>
		<link>http://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/#comment-2206</link>
		<dc:creator><![CDATA[Tweets that mention The Reports of Our Professional Deaths Have Been Greatly Exaggerated, Part 2 « Agnostic, Maybe -- Topsy.com]]></dc:creator>
		<pubDate>Mon, 01 Nov 2010 06:09:40 +0000</pubDate>
		<guid isPermaLink="false">https://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/#comment-2206</guid>
		<description><![CDATA[[...] This post was mentioned on Twitter by Andy Woodworth, Tina Reynolds. Tina Reynolds said: RT sounds good! @wawoodworth: Blogged: The Reports of Our Professional Deaths Have Been Greatly Exaggerated Part 2 http://bit.ly/9jRDtF [...]]]></description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by Andy Woodworth, Tina Reynolds. Tina Reynolds said: RT sounds good! @wawoodworth: Blogged: The Reports of Our Professional Deaths Have Been Greatly Exaggerated Part 2 <a href="http://bit.ly/9jRDtF" rel="nofollow">http://bit.ly/9jRDtF</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dr. Dale</title>
		<link>http://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/#comment-2205</link>
		<dc:creator><![CDATA[Dr. Dale]]></dc:creator>
		<pubDate>Mon, 01 Nov 2010 03:44:45 +0000</pubDate>
		<guid isPermaLink="false">https://agnosticmaybe.wordpress.com/2010/10/31/the-reports-of-our-professional-deaths-have-been-greatly-exaggerated-part-2/#comment-2205</guid>
		<description><![CDATA[I would like to provide some technical perspective on this issue and provide a motivation as to the usefulness of librarians in this problem. I&#039;ve dealt with information classification at my job (although only in passing since we decided not to pursue further research into it at the moment). I would group this problem of information overload / classification / retrieval under the auspices of applied computer science. In particular the area of research is machine learning, natural language processing, textual classification, document classification,  etc. 

I&#039;m going to limit what I discuss to textual (document) information that is available in an electronic form. The idea is how to automatically classify documents into a taxonomy (and how to define such a taxonomy). Since there is such an abundant quantity of information, it will require farms of computers to do the work of document association and classification. Since we would like a computer to do this, that means we need to develop algorithms for the job, however these algorithms must be based on some mathematical framework. 

Some of the issues that must be addressed in this mathematical framework are what features of a document are necessary for its classification, how to quantify the similarity between documents, and how to cluster the resulting data points, etc. Depending on how you want to spin this, this is a supervised/unsupervised machine learning problem. On top of this there is the more subtle (and difficult) problem of training a computer how to learn and detect contextual (semantic) information in documents.

Ultimately this is a nontrivial problem and the focus of current research. Of course practical progress has been made on classification of information,and some have managed to make enormous sums of money from their efforts (see Google...). 

Now to provide my small plug for librarians...As librarians, a contribution could definitely be (among other potential contributions) in providing researchers with area expertise to develop better features (metrics to define on documents) and metrics of similarity. Basically, what it is that should be measured in a document that helps it be classified correctly (for example the percentage of words in a document that are &quot;literary terms&quot; vs. other preset classes of words) and how to successfully compare two documents for similarity using those measurements.]]></description>
		<content:encoded><![CDATA[<p>I would like to provide some technical perspective on this issue and provide a motivation as to the usefulness of librarians in this problem. I&#8217;ve dealt with information classification at my job (although only in passing since we decided not to pursue further research into it at the moment). I would group this problem of information overload / classification / retrieval under the auspices of applied computer science. In particular the area of research is machine learning, natural language processing, textual classification, document classification,  etc. </p>
<p>I&#8217;m going to limit what I discuss to textual (document) information that is available in an electronic form. The idea is how to automatically classify documents into a taxonomy (and how to define such a taxonomy). Since there is such an abundant quantity of information, it will require farms of computers to do the work of document association and classification. Since we would like a computer to do this, that means we need to develop algorithms for the job, however these algorithms must be based on some mathematical framework. </p>
<p>Some of the issues that must be addressed in this mathematical framework are what features of a document are necessary for its classification, how to quantify the similarity between documents, and how to cluster the resulting data points, etc. Depending on how you want to spin this, this is a supervised/unsupervised machine learning problem. On top of this there is the more subtle (and difficult) problem of training a computer how to learn and detect contextual (semantic) information in documents.</p>
<p>Ultimately this is a nontrivial problem and the focus of current research. Of course practical progress has been made on classification of information,and some have managed to make enormous sums of money from their efforts (see Google&#8230;). </p>
<p>Now to provide my small plug for librarians&#8230;As librarians, a contribution could definitely be (among other potential contributions) in providing researchers with area expertise to develop better features (metrics to define on documents) and metrics of similarity. Basically, what it is that should be measured in a document that helps it be classified correctly (for example the percentage of words in a document that are &#8220;literary terms&#8221; vs. other preset classes of words) and how to successfully compare two documents for similarity using those measurements.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
