In Soviet Russia, eBook Reads You

Last week an article in the Wall Street Journal about eBooks tracking the habits and actions of reader got passed around the usual social media outlets. There didn’t seem like there was much of a reaction; I think the article merely reconfirmed a belief that the eReader companies have been amassing data on their users. The article stands out since it mentions some specifics observations that have been found with the data; people read non-fiction in starts and stops, series like The Hunger Games and Fifty Shades are read back-to-back-to-back, and the average reading time it takes people to finish books in the Game of Thrones series. These are insights that the publishing industry has not been able to reliably track (if at all, as the article suggests) and represents a new tool that can analyze a reader base for additional extrapolated information.

However, in this rush to collect data on readers in order to better market or write books for them, there are a couple of things that don’t sit right with me. It leaves me wondering how comparable is the eReader reading experience to the print reading experience. Is picking up a book the same as picking up a Kindle? There doesn’t appear to be any baseline to act as a basis of comparison. The presumption here is that the reading experience is the same between the two mediums when it has yet to be shown. In trying to design a ‘better’ reading experience, they are doing on the data of one medium which may or may not translate to the other.

As eReader use is still in the minority in the United States, what kind of demographics are being measured? It’s grabbing data from the digitally invested even as the book markets still has a strong paper element to it. While it can’t be ignored that a rough usage rate of 1 in 5 people has pushed the eBook market to equal and/or surpass aspects of the print book market, it remains to be seen as to how this aligns with the overall reader population. Are these people a good sample?

In approaching the content side, the article talks about publishers and authors could create books that avoid some of the ‘stopped reading’ pitfalls. This seems like an attempt to thwart Nancy Pearl’s Rule of 50. (Give a book 50 pages. When you get to the bottom of Page 50, ask yourself if you’re really liking the book. If you are, of course, then great, keep on reading. But if you’re not, then put it down and look for another.) I’m wondering why this is a good idea in an activity that is highly subjective and personal. The data might tell you when people stop reading a book, but it doesn’t scratch the surface as to why. Most people might stop reading a book at page 75, but that doesn’t necessarily mean anything. The defect in the story or the failure to grab the reader attention could have happened on any page prior to it; that’s just when they gave up on it. It’s a data point that lacks the necessary context to make it truly valuable.

I’m a bit concerned as to what the feedback will do to the writing process. If an author is told that a particular passage really resonated with readers, how will that change their ideas and approach to their stories and style? Will it take it in new directions? Will it encourage authors to write towards these safe and favorable areas rather than push the boundaries of their craft? Or could it move the reader and author into greater sync in terms of what people want and what the author can provide?

Finally, my last and greatest concern regards privacy. It runs the gamut from who is being followed and what is being collected to whether it will have a chilling effect on subject/title selection. In New Jersey, library record privacy is regulated by statute.

18A:73-43.1. "Library," "library record" defined
    For the purposes of this act:
   a.   "Library" means a library maintained by any State or local governmental agency, school, college, or industrial, commercial or other special group, association or agency, whether public or private.
   b.   "Library record" means any document or record, however maintained, the primary purpose of which is to provide for control of the circulation or other public use of library materials.

18A:73-43.2. Confidentiality; exceptions
    Library records which contain the names or other personally identifying details regarding the users of libraries are confidential and shall not be disclosed except in the following circumstances:
   a.   The records are necessary for the proper operation of the library;
   b.   Disclosure is requested by the user; or
   c.   Disclosure is required pursuant to a subpoena issued by a court or court order.
   L. 1985, c. 172, s. 2, eff. May 31, 1985.

(Retrieved July 1, 2012. Emphasis mine.)

I’ve wondered if the Overdrive-Amazon connection is in breech of this law. From my understanding, Overdrive has enough access to make certain that a library member is authorized to borrow a book; they get the library card number, our system tells them whether or not it is valid, and then the transaction goes forward. In downloading the book to a Kindle, Amazon takes on the role of the library by maintaining and enforcing the borrowing period. They also have personally identifying data that goes along with the Amazon account. In handing off the transaction from Overdrive to Amazon, how much information is exchanged? Is it just a nod or is there more?

Now, it might be argued that it constitutes a legal disclosure as it relates to the proper operation of the library. If that’s the case, then it becomes a question as to whether this is a good idea or not (or even possibly an ethical one). In any event, it is handing over readers for data mining without getting a share of that information snapshot or proper warnings to the library member as to they could be exposed to. (Yes, I know some libraries have warning screens when shifting from the library website to Amazon. Good on them and I can’t find a good link to it.)  In the rush to get eBooks on our virtuals shelves, we bought licenses which do not convey the benefits of ownership and handed over our library members to third parties who do not wholly share our library values and principles.

What do you think?

The AP is Mad as Hell and It’s Not Going To Take It Anymore

The Associated Press, a widely recognized prize winning news organization, has decided that it is not going to take it anymore. They are looking to control their content by adding a “digital wrapper” to stories so as to ensure that they are being read through licensed sources. This is intended to thwart unauthorized search engines and aggregators who derive profit through ads placed next to links to AP stories. Also, it will allow them to determine what is being read on individual computers and what sites people are gaining access to them. Furthermore, they want news sites that use their content to run the same software as part of a “digital permissions framework” that would inform the publisher of their permission obligations with individual stories.

I can’t even begin to describe all of the major problems and issues of this move (announced earlier this year but beginning to be implemented now). I think the right metaphor sounds something like this: after they realizing they had closed the barn door sans horses, the AP is going to where the horses are and attempting to build a new barn around their current position. Their next announcement has to be the invention of time travel which will allow them to go back to the point in time where internet practices and customs were being formed, insert their business model, and destroy this future free internet content timeline.

All kidding aside, there are some immediate concerns. First, while they have given assurances that no private information will be gathered, how can this be guaranteed? There is no denying the fact that a little piece of digital code is reporting information about a reader back to a centralized information center. (I’m sure that privacy advocates will have a field day with that one.) Second, what amount of web traffic constitutes the need for a site to obtain a license? While they have indicated that they are not interested in going after bloggers, their actions in the past have indicated otherwise. (And their announcement that even “minimal use” would require a license is not very convincing.) Third, what about web tagging sites like Delicious and Diigo? Does the sharing of links through these third party sites constitutes a need for licensing (for me or for the site)? Could aggregations of AP stories through these sites be considered a trigger condition for licensing? Fourth, what exactly does this mean for search engines? While the major players in the search engine field have licenses with the AP, how will their content control affect the results of a search? (On a related note, if I was an AP shareholder, I would be asking how this would not drive news content consumers to use other wire services such as the CNN, the BBC, and Reuters?)

The big looming issue here is that of copyright and fair use. As a librarian, I really can’t see how the AP is going to do an end run around fair use. Titles are not copyright protected and the use of a fraction of the total words of an article does not create a copyright breach. While I can appreciate and understand their desire to protect what they have created, it is not the way to do it in this business and computer culture environment. (I couldn’t even find one article that applauded this move for this post.)

We live in a connection culture where information and ideas are passed from person to person through links. And the more links you have to something, the more likely it is to be seen by others. Taking away those links is lowering the chances of your content being seen and passed to others. When companies are making billions of dollars through linking, why would you restrict or confiscate the very things that drive traffic and revenue? It makes no sense in light of other free content examples. (e.g. New York Times.) It’ll be interesting to see how it does play out, but I have a feeling I know how this one ends.

This is not the last call for the end of free content on the Internet. But it should be the last call for companies to stop trying to apply 20th century solutions to 21st century issues.