<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:evnet="http://www.mscommunities.com/rssmodule/"><channel><title>Perspectives</title><image><url>http://perspectives.on10.net/images/10logo_100.jpg</url><title>Perspectives</title><link>http://perspectives.on10.net/</link></image><description>Jon Udell interviews passionate Microsoft innovators</description><link>http://perspectives.on10.net/</link><language>en-us</language><pubDate>Thu, 15 May 2008 16:19:22 GMT</pubDate><generator>EvNet (EvNet, Version=1.0.3034.38425, Culture=neutral, PublicKeyToken=null)</generator><item><title>Where is WinFS now?</title><description>&lt;p&gt;WinFS was an ambitious effort to embed an integrated storage engine into the Windows operating system, and use it to create a shared data ecosystem. Although WinFS never shipped as a part of Windows, many of the underlying technologies have shipped, or will ship, in SQL Server and in other products. In this interview Quentin Clark traces the lineage of those technologies back to WinFS, and forward to their current incarnations. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;img width="300" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/quentin2.jpg" /&gt; &lt;/p&gt;
            &lt;p&gt;&lt;b&gt;Quentin Clark&lt;/b&gt; led the WinFS project from 2002 to 2006. He's now a general manager in the SQL Server organization. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You made a fascinating remark last time we spoke, which was that most of WinFS either already has shipped, or will ship. I think that would surprise a lot of people, and I'd like to hear more about what you meant by that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: WinFS was about a lot of things. In part it was about trying to create something for the Windows platform and ecosystem around shared data between applications. Let's set that aside, because that part's not shipping. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So you mean schemas that would define contacts, and other kinds of shared entities? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah. That's a mechanism, a technology required for that shared data platform. Now the notion of having that shared data platform as part of Windows isn't something we're delivering on this turn of the crank. &lt;/p&gt;
&lt;p&gt;We may choose to do that sometime in the future, based on the technology we're finishing up here, in SQL, but it's not on the immediate roadmap. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: OK. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Now let's look under the covers, and ask what was required to deliver on that goal. It's about schemas, it's about integrated storage, it's about object/relational, a bunch of things. And that's the layer you can look at and say, OK, the WinFS project, which went from ... well, it depends who you ask, but I think it went from 2002 until we shut it down in 2006 ... what was the technology that was being built for that effort, in order to meet those goals? And what happened to all that stuff? &lt;/p&gt;
&lt;p&gt;You can catalog that stuff, and look at work that we're doing now for SQL Server 2008, or ADO.NET, or VS 2008 SP1, and trace its lineage back to WinFS. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Let's do that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: OK. I guess we can start at the top, with schemas. We're not doing anything with schemas. At the end of the WinFS project we had settled on a set of schemas. It was a very typical computer science problem, where the schemas started out as a super-small set of things, and then became the inclusion of all possible angles, properties, and interests of anybody interested in that topic whatsoever. We wound up with a contact schema with 200 or 300 properties. &lt;/p&gt;
&lt;p&gt;Then by the time we shipped the WinFS beta we were back down to that super-small subset. Here's the 10 things about people that you need to know in common across applications. &lt;/p&gt;
&lt;p&gt;But all that stuff is gone. The schemas, and a layer that we internally referred to as base, which was about the enforcement of the schemas, all that stuff we've put on the shelf. Because we didn't need it. It was for that particular application of all this other technology. &lt;/p&gt;
&lt;p&gt;So that's the one piece that didn't go anywhere. &lt;/p&gt;
&lt;p&gt;Next layer down is the APIs. The WinFS APIs were a precursor to a more generalized set of object/relational APIs, which is now shipping as what we call entity framework in ADO.NET. &lt;/p&gt;
&lt;p&gt;What's getting delivered as part of VS 2008 SP1 is an expression of that, which allows you to describe your business objects in an abstract way, using a fairly generalized entity/relationship model. In fact we got &lt;a href="http://portal.acm.org/citation.cfm?doid=1247480.1247532"&gt;best paper at SIGMOD last year&lt;/a&gt; on the model, it's a very good piece of work. &lt;/p&gt;
&lt;p&gt;So you describe your business entities in that way, with a particular formal language... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For people who haven't seen this, how would you characterize that language? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: It's pretty standard entity-relational. It's really a matter of describing to the system a set of properties and collections and relationships among entities. The important thing we tell people is to describe their entities as they think about them. Not as they think they should be expressed in a fully normalized database schema, and not as they need to program to them as objects, but in terms of how they think about them, and want to be able to report on them, or interact with them. &lt;/p&gt;
&lt;p&gt;From there we can derive objects you can program against, we can derive schemas to build a store of them. &lt;/p&gt;
&lt;p&gt;The traceback to WinFS is that we had a very fixed way of doing this for a particular set of entities. We built the schema around items, and items were entities that had relationships to other items. We built this whole model on a more generic substrate that we never expressed. &lt;/p&gt;
&lt;p&gt;So we said OK, we didn't ship the WinFS APIs, but we have this asset, a more generalized expression framework for entities, let's figure out how to finish that work up, and get that delivered as part of the next ADO release. &lt;/p&gt;
&lt;p&gt;This stuff is now very well integrated with LINQ. You can do LINQ to relational, where LINQ will look down into the database, look at the schemas that are there, and express that directly up into LINQ. Or you can do LINQ to entities, which allows you to have a layer of abstraction between what you're programming to and your underlying physical database schema. &lt;/p&gt;
&lt;p&gt;That work is ongoing, we're getting good feedback, we'll see how far it takes us. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How much continuity is there in terms of the team? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: A lot. When I did the reorg, I had an Excel sheet of everyone in the organization and where we were moving them to. Last I looked at it, 80-plus percent of the team was still in SQL somewhere. &lt;/p&gt;
&lt;p&gt;One of the interesting things about WinFS was that we started hiring a different kind of person. The database team is full of traditional hardcore systems database guys. When we did WinFS we were looking for a different thing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In fact you don't consider yourself to be a hardcore database guy, right? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Right. I'm a good example. I started at Microsoft in the Word group, and went from there to IIS to something called Application Center, worked on the manageability technologies for a while, and then was asked to come over and do WinFS. So my background was much more about how to use databases, how do you build apps around them, and not so much what are the internal algorithms you should use for bitmap indexing. &lt;/p&gt;
&lt;p&gt;Of course we had a lot of folks from the core database team, but we hired a lot of folks that had experience with compilers, with user interfaces, with building apps on the database. A lot of those folks who were leading the API effort for WinFS are now leading the API effort for all of SQL. &lt;/p&gt;
&lt;p&gt;So that's the story for the API team. As for the rest of it, well, there's obviously a big chunk around file systems. If you want to do this shared data model, you want it to be applicable to all data, not just things you can express relationally. So we had to figure out how to merge database constructs with file systems. &lt;/p&gt;
&lt;p&gt;A lot of people thought this was impossible, and would harken back to Cairo and various other projects announced and unannounced to the public world around integrated storage, that didn't necessarily produce fruit. &lt;/p&gt;
&lt;p&gt;We had one key advantage. We found an architectural approach that allowed us to control the semantics, and provide transactional database consistency over the files that were involved, while still allowing the file system to be in control when it came to file-handle-level operations. &lt;/p&gt;
&lt;p&gt;We did it with a kernel driver that allowed us to control the namespace, and keep the database involved. The database lives up in user mode. As far as the operating system is concerned, there's no difference between SQL Server and Microsoft Word. They're high-level user-mode apps that occasionally drop down and make requests of the kernel. &lt;/p&gt;
&lt;p&gt;So there was a fundamental disconnect. How do we maintain control over this low-level system concept, the file system, by a user-mode app? We built a kernel-level driver to communicate back to the user-mode SQL process. It had a cache of what things should look like, and what things are in what state, but it was there along the API path for the file system, to allow it to control the namespace operations over files that were "in" WinFS. &lt;/p&gt;
&lt;p&gt;People would often ask me if WinFS was a file system, and I'd struggle with the answer to that, because, well, you know, from a certain standpoint the answer is yes. The stuff I saw in the shell, was it in the WinFS filesystem? Well, OK. But there are no streams inside the database. So from a user perspective, those files were "in" the filesystem. But from an API perspective it was more nuanced than that. I could still use the Win32 APIs, get some file, open it, and from that point forward the semantics were exactly like NTFS. Because it &lt;i&gt;was&lt;/i&gt; NTFS at that point. &lt;/p&gt;
&lt;p&gt;There was a certain place along the API chain where the database was completely out of the way. This allowed us to get the perfect compatibility that had tripped up other integrated storage efforts in the past. Other efforts tried to get this compatibility by emulating all the Win32 APIs, which is tough. And the performance bar is very high. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So how does this carry forward, if it does? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: It does. That approach was so good that we decided to generalize it for SQL Server 2008, as a feature called filestream. It's basically a new kind of blob support for the database. You configure a column for filestream, you can take a file and insert it as a record, you get back a file handle, you can stream things into that file handle. You can do queries and get back file handles, and get streaming API-level NTFS performance on the files you put in there. &lt;/p&gt;
&lt;p&gt;What we have not done is the namespace support. So you don't get to walk through a directory of files. You examine a row, you ask that row to give you back the right token, you start doing the Win32 operations on it. &lt;/p&gt;
&lt;p&gt;But the rest is integrated. You back up the database, you back up the filestream. From most perspectives -- except mirroring, which we didn't get to fully integrating -- it looks like any other blob. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Where do you see that being used to good effect? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Right now there's a choice people have to make. There's a size limit on blobs in the database, because we put them inside database pages, and that leads to a performance problem as well. If you want to pull a 2-gigabyte stream out of the database with traditional blobs, it's not as performant as walking up to NTFS and using a file handle. We have to recreate the file by putting together a series of database pages that are themselves a level of indirection on file system pages. &lt;/p&gt;
&lt;p&gt;So people today have to make a choice. Do I want the integration with the database, so backup works, my transactional semantics work, all this stuff works, and live with the performance and size limitations. Or do I want the best possible performance, and basically no limitations on size, by putting things in the file system, and then having my application logic figure out how to glue together the database world and these files that are now strewn about the file system. And when I do a backup, then I also have teach my operations guys that when you back up the database your not backing up all the data, you also have to worry about these files the database knows nothing about. &lt;/p&gt;
&lt;p&gt;With filestream, people don't have to make the choice. They get the performance they want, with the database integration they expect. &lt;/p&gt;
&lt;p&gt;Now the next place to take that, after 2008, is to add Win32 support. So we did this other feature as part of WinFS, which we're calling hierarchical ID. It's a column type, a new column type, which creates hierarchy support in the database. &lt;/p&gt;
&lt;p&gt;We did this for WinFS because obviously if you're storing your data in a filesystem-like hierarchy, you need to be able to do things like show me all the stuff in this folder, and answer that query lickety-split. You can't be walking through record by record looking for matches. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Or dealing with the SQL way of expressing hierarchy, which is doable but beyond my comprehension. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah, it's hard. The fundamental problem is that the query processor doesn't understand the concept of path. It understands matches on columns. It can find substrings within records, but it's kind of brute force. You can use fulltext indexing, but... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: ... but you don't get containment for free. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. So hierarchical ID is a column type that teaches the optimizer about hierarchy, about path, so you can do queries that find all the things contained within this part of the path. &lt;/p&gt;
&lt;p&gt;So we have that feature also shipping in 2008, and there are all sorts of different uses for it. For example, people use it for compliance. They'll create a hierarchy of different confidentialities and compliance levels. This thing is confidential, which is a superset of things that are executive-eyes-only. Hierarchies like that are just out there in the world. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How do you build and visualize them? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: You tell us about them. You express the form of your hierarchy, and you populate the records accordingly. But I don't think there's a tool yet. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So there's the filestream piece, and the hierarchical ID piece, and then the Win32 namespace pieces is the shoe that hasn't yet dropped? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. In the next release we anticipate putting those two things together, the filesystem piece and the hierarchical ID piece, into a supported namespace. So you'll be able to type //machinename/sharename, up pops an Explorer window, drag and drop a file into it, go back to the database, type SELECT *, and suddenly a record appears. &lt;/p&gt;
&lt;p&gt;Potential uses for that? It's all over the place. Take our own expense reports. We used to have these Excel form templates, and you'd fill it out and submit it to some system. Then we hit a phase where it was all online, so you're on the plane home and too bad for you. But imagine they could reintroduce that template again, and you could save that Excel file directly into the database. &lt;/p&gt;
&lt;p&gt;Or more importantly, if you go to edit the thing, you don't have this process where you've taken a copy of the thing, you're editing it, you're sending it back through a mid-tier system that then has to reconcile the database records with the filesystem records. I can just say, oh, I need to add three more things. I double-click, and yes I'm still interacting with some web-based app, but the links I get are real Win32 links. I open the thing, I edit it, I stick it back, everything knows that it was changed within the right transactional semantics. &lt;/p&gt;
&lt;p&gt;People are constantly having to bridge between the file world, and the world of data around the files. Providing Win32 support gives developers the opportunity to allow the desktop clients to directly interact with a file that's part of some application, without having to go through all the semantics of the mid-tier. &lt;/p&gt;
&lt;p&gt;Are there always going to be some applications that will want to have mid-tier control over every aspect of every part of every workflow? Of course. But from a productivity standpoint, to be able to allow people to build applications more quickly, to be able to customize applications and not have to manage all those semantics themselves, that's huge. &lt;/p&gt;
&lt;p&gt;Sync is another topic, but imagine we build the right things around synchronization, so people can take the files offline. It's a major productivity gain. As a developer, you know the consistency of the world you're dealing with. You're not having to create and manage and upload and deal with copying all on your own. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You've alluded to the downside already, which is that it now becomes a new data management discipline that is neither familiar to the people from the filesystem world nor from the database world, it's a hybrid, and that's an obstacle. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Sure, there's a learning curve, as with any other new technology. &lt;/p&gt;
&lt;p&gt;So, that's the filesystem piece, and I'm really proud of the work we've done there. We're introducing the kernel driver in 2008, we're giving people this nice marriage between the two worlds, and then we get to take that next step in the next release and give people the complete picture. &lt;/p&gt;
&lt;p&gt;I can live with the argument that we don't have integrated storage yet. Yes, we have filestream blobs in the database, which is a big step. We have the performance and the database consistency all in one package, and that's a huge step forward. But when we have Win32, at that point, unarguably, we have integrated storage. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How do you think that plays out as the center of gravity shifts toward the cloud? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: There is no app in the world that doesn't need a database. Every cloud app has one under the covers somewhere. One thing we've learned in the last few years is that the fuzziness between structured data and unstructured data is just increasing. The major online apps that I interact with have both. You know, Hotmail has attachments. And they have limitations on attachments because they have trouble managing sizes and whatever else. &lt;/p&gt;
&lt;p&gt;We have things now where people can create some space, put some files up there, but man, if you want any metadata around those files, too bad, it's just a dumb blob store. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What I'm getting to here is that, well, part of the challenge for WinFS as originally conceived, with a heavy client component, was: How do you get the network effects? Five years later the center of gravity has shifted, there are shared spaces in the cloud where those effects can happen. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yes. And I think the technology we're building is underlying technology for the cloud apps. All of our major properties are built on SQL, and they want to use this stuff, we have work going on there, pre-release work to take advantage of these features, because they want them. &lt;/p&gt;
&lt;p&gt;From a business standpoint, my first concern is how to provide value to our customers. And those are our customers. The people building the cloud apps are our customers. &lt;/p&gt;
&lt;p&gt;Now, beyond that, one of the things we used to say about WinFS was that it was the world's best mashup playground, because you had all the data in one place. In the mashup world you're talking to one service at a time. &lt;/p&gt;
&lt;p&gt;Do I think that the opportunity to build applications that solve real end-user problems building on technology like this continues to thrive? Sure. &lt;/p&gt;
&lt;p&gt;When I think about the enterprise space, which is primarly where we sell SQL, they want this. They want a repository, and they want it not to be restricted on the types of data it has. &lt;/p&gt;
&lt;p&gt;You'd be surprised, SQL's behind some of the biggest cloud services on the planet. And our customers who are building them have been struggling with this structured-versus-unstructured data problem. &lt;/p&gt;
&lt;p&gt;Filestream alone gives them the answer. They don't so much need the Win32 aspect, because they have enough app development expertise in the mid-tier to bridge this stuff reasonably well. But they do want the transactional and backup consistencies that filestream gives them. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is that ultimate mashup playground also a good environment in which to iteratively work out what some key schemas need to be? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah, that leads to another interesting point. Going through the litany of technologies that have come from WinFS, one of them is the notion of what I refer to as semi-structured records. The schema is not necessarily all that well defined at the outset of the application. How does the database handle that? We had built WinFS around a feature called UDTs, which is a column type -- a CLR type system type. &lt;/p&gt;
&lt;p&gt;We finished that up, and we built a whole spatial datatype on it in SQL Server 2008, it's all good stuff. &lt;/p&gt;
&lt;p&gt;But when we stepped back and looked at the semi-structured data problem in a larger context, beyond the WinFS requirements, we saw the need to extend the top-level SQL type system in that way. Not just UDTs, but to have arbitrary extensibility. &lt;/p&gt;
&lt;p&gt;So we did this feature in SQL Server 2008 that we internally refer to as sparse columns. It's a combination of various things. First, a large number of columns. Right now there's a 1024 limit on the number of columns in a single SQL table. We're way widening that out. &lt;/p&gt;
&lt;p&gt;That comes of course with the ability to store data that's very sparsely populated across a large number of columns. In SQL Server 2005 we actually allocate space for every column in every row, whether it's filled or not. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: This is what the semantic web folks are interested in, right? Having attributes scattered through a sparse matrix? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. And that leads to another thing which we call column groups, which allow you to clump a few of them together and say, that's a thing, I'm going to put a moniker on that and treat it as an equivalence class in some dimension. &lt;/p&gt;
&lt;p&gt;Then we have something called filter indices, where instead of creating an index that spans all the records in a table, you can specify what records it applies to. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When it's really cheap to make lots of those equivalences, you get the ability to let people call things however they want to call them. There can be lots of aliases and labels floating around, and people can have their own vocabularies. You don't have to be so rigid about names. As you discover equivalences, you map them, and that's very efficient. Versus trying to get people in committees to agree how to call things, that's the hardest problem in the world. But if you can let people operate in their own semantic namespaces, and then bridge things together... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: And that gets back to why the entity data model is so important. It lets people have their own way of describing, programming to, and interacting with the data they want to deal with. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Now what about relationships? In WinFS, a relationship among entities was a first-class object. How does that carry forward? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: The notion of a relationship is a first-class object in the entity data model. Now what we haven't done there is bridged an understanding of that into the database itself. Can the query processor understand a relationship, and be optimal for navigating through those semantics? We haven't bridged that part of the world yet. It's certainly possible to create database schemas that allow you to have good query efficiency through your entity model, but it's still intellectual work. We'd like it to be so that the database can look at an EDM schema and create at least the approriate indices so when you are examining things through that lens, we can make sure your experience is optimal. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Finally there's synchronization. It went through a classic computer-science learning curve as well. At first we said, we need to synch with the cloud, with other WinFS instances, with server systems, how hard can this be? &lt;/p&gt;
&lt;p&gt;Then we quickly realized how hard this was. What should be more infamous than people breaking their pick on integrated storage is people breaking their pick on multimaster replication. It's an incredibly difficult problem to get right. &lt;/p&gt;
&lt;p&gt;Apps that have gotten this right for a particular domain have become wildly popular. Lotus Notes got it right for a particular domain, so did Exchange and Outlook, but a generalized solution has been very elusive. &lt;/p&gt;
&lt;p&gt;Anyway, we did a partnership with Microsoft research, and at some point along the arc we solved it fairly well. It's not trivial. This is not something that ends up being a simple solution to this very complex problem. It's actually reasonably sophisticated, but it works, and we built it in as part of the last WinFS beta. &lt;/p&gt;
&lt;p&gt;As they realized they were onto something, they started to fork out a componentized version of it that's now finding its way into a bunch of Microsoft products. The official branding is Microsoft Sync Framework. I think they're on target for shipping it in six different products, and for embedding it all over the place. &lt;/p&gt;
&lt;p&gt;Building an app like Outlook, from scratch, is hard. You can always interact with your data, when you're connected the thing will always synchronize and reconcile, when it's offline it still provides a consistent experience. To build that from scratch, it's really hard. Taking the sync framework allows people to go and build that experience without having to solve the hard multimaster synchronization problems. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Finally, we'd done a bunch of work to keep the SQL engine tamed and behaving properly on the desktop. Some of that has found its way into SQL Server 2008 and some has not, because there's a less pressing need for it. But for departments, and for SQL Server Express on the desktop, we still want to finish that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So to wrap up, I'd like you to reflect on how the original environment for WinFS was the end-user desktop, but now the environment in which many of these technologies have come to fruition is the enterprise datacenter and backoffice. How do these worlds yet come together? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: I was very happy to be able to take the technology forward, because I saw the broad applicability, not just in the problem space we were working on, but in terms of the general usefulness of the database. &lt;/p&gt;
&lt;p&gt;My job is to grow the usefulness of the database. The work we did with WinFS was in line with that, and I'm happy with that, but there's a part of me which is still unfulfilled. Boy, what would it mean if every application could have some shared notions about, for example, the people in my life, that other applications could plug into and use. &lt;/p&gt;
&lt;p&gt;Can we express that fully in a cloud way? Maybe. It harkens back to the old Hailstorm ideas. And we have things like Astoria [SQL Server Data Services] that is a projection of entities over the web. That's awfully familiar, both in terms of WinFS and in terms of Hailstorm. &lt;/p&gt;
&lt;p&gt;Where it goes, I don't know. We've made a choice right now to incubate some underlying platform technologies for the web, and allow the operating system team to cycle on the stuff that's on their plates right now. &lt;/p&gt;
&lt;p&gt;But I think not too long from now we'll come out of those cycles and say, OK, we have all this fundamental technology, what's the next big innovation we can do? &lt;/p&gt;
&lt;p&gt;That's kind of where we got tripped up in the Longhorn cycle. We were building too much of the house at once. We had guys working on the roof while we were still pouring concrete for the foundation. &lt;/p&gt;
&lt;p&gt;At one point we realized we needed to decouple things. And that really did give this team the freedom to go off and take these underlying technologies, which we believe were fundamental to the database, and get them done correctly. &lt;/p&gt;
&lt;p&gt;But I do at some point want to see that place in my heart fulfilled around the shared data ecosystem for users, because I believe the power of that is enormous. &lt;/p&gt;
&lt;p&gt;I think we'll get there. But for now we'll let the concrete dry, and get the framing in place, and then we'll see how the rest of the house shapes up. &lt;/p&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/Where-is-WinFS-now/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/Where-is-WinFS-now/</comments><link>http://perspectives.on10.net/blogs/jonudell/Where-is-WinFS-now/</link><pubDate>Thu, 15 May 2008 13:20:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma</guid><evnet:views>1989</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/Where-is-WinFS-now/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>WinFS was an ambitious effort to embed an integrated storage engine into the Windows operating system, and use it to create a shared data ecosystem. Although WinFS never shipped as a part of Windows, many of the underlying technologies have shipped, or will ship, in SQL Server and in other products. In this interview Quentin Clark traces the lineage of those technologies back to WinFS, and forward to their current incarnations...</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.mp3" expression="full" duration="3240" fileSize="25914048" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma" expression="full" duration="3240" fileSize="26213725" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma" length="26213725" type="audio/x-ms-wma" /><dc:creator>Jon Udell</dc:creator><slash:comments>1</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/Where-is-WinFS-now/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/22378/Trackback.aspx</trackback:ping><category>WinFS</category></item><item><title>OpenSearch federation with Search Server 2008</title><description>&lt;p&gt;With the new OpenSearch-based federation capability in Search Server 2008, you can integrate any external search service that can expose results as an RSS feed. In this podcast Jon Udell discusses search federation with Richard Riley and Keller Smith. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/rriley.jpg" /&gt;
            &lt;p&gt;&lt;b&gt;Richard Riley&lt;/b&gt; is a Senior Technical Product Manager for Microsoft Office SharePoint Server 2007. He is responsible for driving Technical Readiness both within and outside of Microsoft and specializes in the Enterprise Content Management and Search features of the product. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt; &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/kells.jpg" /&gt;
            &lt;p&gt;&lt;b&gt;Keller Smith&lt;/b&gt; is a Program Manager in the Business Search Group at Microsoft. He designs and manages new enterprise search features in the areas of Federation and End-User UI. His passion has always been to improve the lives of users through exciting new ideas in software. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;a href="http://blogs.msdn.com/enterprisesearch/default.aspx"&gt;Enterprise Search Blog&lt;/a&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/enterprisesearch/connectors/federated.aspx"&gt;Search Gallery&lt;/a&gt; &lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/enterprisesearch/connectors/federated.aspx"&gt;Location Definition File Schema&lt;/a&gt; &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What's the lineage of this search server? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; The technology that was built into Index Server, way back in the NT4 option pack, has grown and diversified into various products, including desktop search and SharePoint. They've split apart now, but the common DNA is there. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What differentiates this search server from its predecessor? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; We found that customers wanted to use the search capability without buying the whole SharePoint product. So we split the search features into Microsoft Office SharePoint Server for Search. People could buy that and use the search features without the full MOSS functionality. Search Server is the next version of that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What were the domains over which MOSS 2007 could search? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Anything you could crawl. Out of the box, SharePoint plus other content sources we had handlers for, including Notes. Or you could go to the effort of writing your own protocol handler, or business data connection. But if you couldn't find a way to index it yourself, there was no way to connect to the data. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; So how does federation change the game? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Instead of indexing the content, you're leveraging an external search engine that already exists. That engine returns results back in an XML format we can render. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; I was fascinated to learn you're using the OpenSearch mechanisms and formats to accomplish this. I did an early implemention for Amazon A9, and it was trivial since I already had an RSS feed coming out of the search engine I wanted to integrate. Is that still how it works? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes. Any search engine that emits an RSS feed, you can connect to. It takes about 5 minutes to set it up. You take the query URL, put in into a federated location definition (FLD) file), and away you go. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; I guess the part of OpenSearch people will be most familiar with is the description that drives the search drop-downs in browsers. It's a little package of XML that defines the template for the query. You must be using that in Search Server as well, when it acts as a client to federated sources. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, exactly. SharePoint is behaving as a client, just as IE is. When you create a federated location definition, you're creating one of these OpenSearch description files. But, we add some schema changes for the triggers that SharePoint uses to know when to send queries to that location. And we add the XSL used to render the results. So we extend the OpenSearch schema to make it more useful to SharePoint. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; When you start shipping queries over the net to multiple federated sources, you start running into issues of sequencing and latency. How do you deal with that? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You add federated locations as web parts. And you can choose whether to load them synchronously or asynchrously. Everything synchronous will be loaded first, and then the queries are sent off to each asynchronous web part. &lt;/p&gt;
&lt;table&gt;
    
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; And you'll use AJAX to weave in results in as they arrive? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Right. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; One of the sources can be SQL Server. How does that work? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You need a simple connector that exposes an RSS feed. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; In the case of SQL Server, there's the option to do structured search. Can I pass through an XPath query? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Well, it's up to you to write the connector. If you want to accept XPath in the query, and return results on that basis, it's your code. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What I like about this is that the act of creating an OpenSearch RSS feed on top of a source is just plain useful, independently of Search Server. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Absolutely. We use that in SharePoint Search, and also in Search Server, you can get an RSS feed of any result set. It's great for alerting. Set up a fairly restricted search, and your RSS reader will get new items when they appear. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; It's great that you're using OpenSearch this way. Was there any debate about it? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; There are many ways to connect to other sources, but we felt there was a need to federate out in a very lightweight way. OpenSearch already had a scheme that was relatively well adopted, and served our needs as a base, though we did extend it as I've mentioned. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; How do I control the results display? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You can customize the XSL, so anything you can retrieve from the source you can format in any way you want. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; Can I extend the results metadata? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, you can override the OpenSearch defaults, specify which fields you care about, and use those in your XSL. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; And, Search Server is free? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, just go download it from microsoft.com/enterprisesearch. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; How far can you go with the free version? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You can install the express version with either SQL Express or SQL Server. With SQL Express you can run up to 400 to 500 thousand documents. With SQL Server, you can run to millions. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What about federation? Will there be a cap on the number of sources? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; No limit on sources. The only difference is that the express version requires you to install all the search services onto a single server. With the licensed version you can spread those across machines. &lt;/p&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/OpenSearch-federation-with-Search-Server-2008/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/OpenSearch-federation-with-Search-Server-2008/</comments><link>http://perspectives.on10.net/blogs/jonudell/OpenSearch-federation-with-Search-Server-2008/</link><pubDate>Thu, 01 May 2008 14:59:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma</guid><evnet:views>370</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/OpenSearch-federation-with-Search-Server-2008/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>With the new OpenSearch-based federation capability in Search Server 2008, you can integrate any external search service that can expose results as an RSS feed. In this podcast Jon Udell discusses search federation with Richard Riley and Keller Smith.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.mp3" expression="full" duration="1448" fileSize="11913408" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma" expression="full" duration="1448" fileSize="12053033" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma" length="12053033" type="audio/x-ms-wma" /><dc:creator>Jon Udell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/OpenSearch-federation-with-Search-Server-2008/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/22198/Trackback.aspx</trackback:ping><category>    Search Server 2008</category><category>   federation</category><category>   OpenSearch</category></item><item><title>Ray Ozzie introduces Live Mesh</title><description>&lt;img src="http://perspectives.on10.net/Link/70e4b624-1012-42d1-949f-0766affe328c/" border="0" /&gt;&lt;h1&gt;Introducing Live Mesh&lt;/h1&gt;
&lt;p&gt;
&lt;i&gt;
In this audio version of a &lt;a href="http://channel9.msdn.com/showpost.aspx?postid=399578"&gt;Channel 9 video&lt;/a&gt;, Ray Ozzie discusses his role as Microsoft's chief software architect, and the role of Live Mesh as one aspect of an emerging Internet-oriented platform. 
&lt;/i&gt;
&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/ozzie-livemesh/ozzie01.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Ray Ozzie&lt;/b&gt; is Microsoft's chief software architect.
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;hr&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://www.mesh.com"&gt;mesh.com&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href="http://channel9.msdn.com/ShowPost.aspx?PostID=399578"&gt;Video&lt;/a&gt; of this interview on Channel 9&lt;/div&gt;
&lt;div&gt;&lt;a href="http://channel9.msdn.com/ShowPost.aspx?PostID=399577"&gt;Abolade Gbadegesin&lt;/a&gt; on the architecture of Live Mesh&lt;/div&gt;
&lt;div&gt;&lt;a href="http://www.on10.net/blogs/nic/Hands-on-with-Live-Mesh/"&gt;Demo&lt;/a&gt; of Live Mesh on Channel 10&lt;/div&gt;
&lt;div&gt;Mike Zintel on &lt;a href="http://blogs.msdn.com/livemesh/archive/2008/04/21/live-mesh-as-a-platform.aspx"&gt;Live Mesh as a platform&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;Background on &lt;a href="http://blog.jonudell.net/2007/12/07/from-simple-sharing-extensions-to-feedsync/"&gt;FeedSync&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
&lt;hr /&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  Hello Ray! Thanks for joining us.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  It is great to be here Jon.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: So, it's been about 3 years since you joined Microsoft, initially as CTO. People tend to wonder what it's like coming from a company of 300 to a company on the scale of Microsoft. 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I've had the luxury of career working for small companies: Software Arts in the early days, and a couple of startups in Iris and Groove. But Lotus ended up being acquired by IBM, so I was at one big company before coming to Microsoft. It's tremendous in terms of the potential impact that someone can have.  I think everyone at Microsoft tends to be here because you want to have a tremendous impact, and certainly that was a tremendous draw.
&lt;/p&gt;
&lt;p&gt;
What I really do enjoy about the role as CSA, is being at the juncture of business strategy, product and market strategy, and technical strategy. I have the opportunity to work with not only the executive team on larger strategic issues, but also with the product teams at fairly detailed technical architectural levels. As an engineer, it is really fascinating, and I've met a lot of great people.  
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: People also wonder what it's like to step into a role formerly occupied by Bill Gates. What kind of continuity will there be, and how might you want to reshape the role?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;RO&lt;/b&gt;: Bill is a very unique individual. There will never be another Bill.  He has got a tremendous palette of talents, both technical as he applied at Microsoft, and non-technical in the role he's moving into. In shaping the role after July, when he won't be here full time anymore, he really split the role into two pieces. Craig Mundie takes over long-term issues, research and things like that. And I have taken over most of the technical and product strategy related to products that'll ship within a couple years.  
&lt;/p&gt;

&lt;p&gt;
My background is different than Bill's was. I've been a lot more hands-on in the product design for a number of years. I'm dealing with broader issues than I've dealt with in the past, but my background in product development gives me a lot of grounding in terms of working with a development team. And I think Craig and I make a good pairing in terms of filling his shoes.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  How do you balance the need to span a vast spectrum of activities and the need to go deep on things?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Time management -- attention management -- is really the biggest challenge. The pace is fairly brutal.  At the beginning of the year, I'll kind of plan out how much of my time in hours I want to spend in different categories of things.  There's some allocation for the rhythm of the business and high level strategic things.  There are allocations in terms of time I want to spend with product groups. 
&lt;/p&gt;

&lt;p&gt;
And then there's a fraction I didn't initially realize I had to be as intentional about, but sometimes you have to create white space because, like a task scheduler that has too many ready tasks, you can thrash if you spend all day dealing in a reactive mode to the incoming issues, the incoming communications.  Sometimes you have to create some white space in order to think and understand what is going on in the environment. I can do that by going away, by traveling to our international offices.  Bill had something called Think Week that we are continuing in a slightly modified form going forward. And there are other ways.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  Is one of those ways maybe to sometimes focus deeply on particular interests of yours?  If so, what would some of those be?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Well, the problem is most of my interests are technically related and so in theory I would just go write some code. I don't do that anymore, though, and honestly the best way that I've found to clear my mind really is either to go to a conference that's a little off the beaten path, or just travel somewhere, maybe with my wife, that is not technology related, just to clear it out and re-prioritize.  It is probably something that everyone has to do. In the old days when I did code, I used to have a 4-hour rule that said: "Do not write code unless you can at least have 4 hours of contiguous time where you will not be interrupted." Otherwise you end up introducing more bugs than the code you are writing. In a way, this is kind of the life management equivalent.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  So, in the talk that you gave at MIX, you introduced the interesting phrase "utility computing". I got to thinking that although "web 2.0" is the meme of the current era, people may have forgotten that for quite a while, Tim O'Reilly was actually trying to establish "Internet operating system" as a meme. That didn't really stick, and now it's come around again as "web 2.0", but "Internet operating system" is a pretty evocative phrase, as is "utility computing." We have talked about some things that are coming.  We're going to talk more about a part of that initiative here, the Live Mesh announcement, but I wonder if you could reflect a little bit on what an Internet operating system could be.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Maybe I should just step back and talk about the environment a little.  I mean, when I got into this business back in the 70s, utility computing was really hot, it was called the mainframe.  We had these raised floors, freon cooling. It really was a utility. We used virtualization. There was "time sharing".  They were boring terms, but we were used to treating computing as a utility, and the PC revolution was all about empowerment and kind of getting back some of that personal feel, getting control of building solutions for things that might be really meaningful for you.  
&lt;/p&gt;

&lt;p&gt;
So the pendulum swung to the personal, and then with the web, when the web first emerged, it's odd, the nature of how technology is shaped is based on the constraints of the environment, whether it is computing constraints, communication constraints, and so on. The early web grew up in an era of dial-up, of 56K dial-up, so a lot of the way the protocols are structured, where computing was located, were based on that balance of computation and storage on the back-end, a really thin straw, a smart terminal that we call a browser on the front end, and that's how it was born.
&lt;/p&gt;

&lt;p&gt;
Nowadays, we've got increasing ubiquity of broadband.  We've got this big fat pipe so we can send more data.  You still can't be chatty, but we can send more data back and forth, and it gives us, as architects, the ability to revisit what should be the right balance between, for any given solution, of what's on the back end and what's on the front end.  We have amazing computation abilities on both sides.  We have amazing storage abilities on both sides.  So now, in this unconstrained environment, really the question is: what is the right way to build a solution?  Application models have had begun to evolve that start to take advantage of some of these things on both sides, and I think really when we talk about utility computing now, what we are saying is, if you are building a solution now, what is the right way that back-end utility should expose its resources? What business models? What application design patterns are appropriate for the cloud? Map-reduce-like patterns, pure horizontally scalable patterns, are much better for that back-end. 
&lt;/p&gt;

&lt;p&gt;
What should the front-end programming model be like? We started with the PC in a model of one computer for some subset of users.  Bill and Steves dream was to have a computer on every desk and in every home, and we have gotten into that point, but now we have gone beyond that point. Every individual has a phone and a PC.  Many people have multiple PCs at home.  They might have a PC at home, at work.  We have got computer-like devices in our cars, sitting underneath our TVs with the set-top box. People have home security systems.  There are lots of devices around, and I think now is the time to reflect, what is the right programming model for the client environment that we have got?  What is the right programming model for the cloud? And at least from Microsoft, how can we built tools and services to help developers build great businesses, to build great solutions, using both those back-end and front-end resources? 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  From that perspective, we see some interesting offerings emerging in what we can broadly call the Internet operating system space.  Amazon surprised me quite a lot in the last couple of years in the things they have done. I don't think people were too surprised by the Google announcement more recently.  I would invite you to reflect on the kind of company that Microsoft historically has been, and therefore, the kind of approach that Microsoft can take to this problem as it might compare to the kind of approach that these other companies can take.  
&lt;/p&gt;


&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I have no products announcements to make.  [laughs]
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I understand that. [laughs]
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I'll just reflect in our approach, and compare and contrast if you like.  Microsoft's approach...I can tell you this because I was a developer for most of my career on the outside of Microsoft...I've been here for 3 years, but I had a relationship with Microsoft as an ISV since roughly the beginning the Microsoft.  I met Bill and Steve in 1981 when I was first coming out to talk to them about some DOS issues. Microsoft, I believe, has always taken a perspective of ... its DNA is as a platform company. And in order to have a successful platform, you've got to have successful ISVs, people who are being selfish about their solution, what they are trying to deliver, but we have to -- semi-selflessly and semi-selfishly -- serve those people.  We've got to build a good business, but we've got to do so by serving those people and letting developers build great businesses. So in any platform, any utility computing environment that we would consider, we would be taking a broader perspective.
&lt;/p&gt;
&lt;p&gt;
We would look at a 20- or 30- year horizon and say: How is this all panning out? What is the broad range of developers out there? What does the new-age ISV look like?  It's a web ISV. There are also client ISVs, but client code is changing, it has cloud interconnections now.  What does a VAR look like these days, a solution VAR?  What does an enterprise developer look like?  What is the enterprise environment going to look like when it's transitioned from an on-premises data center to one that factors in both an on-premises data center and the cloud.  Perhaps there would be some businesses, small to medium size businesses, that might shift completely to the cloud for their back end.  But most major enterprises would have some kind of hybrid. So when we step back and look at tools, languages, application design patterns, operating systems, and runtimes, we kind of look at it and say: How will we design this for the way that the environment will shape over the next 5, 10, 20 years? As opposed to what does the web look like today, what are the capabilities today.  I think Amazon has done a great thing in terms of opening people's eyes to the power of, coming from the ground up, what does it look like to make raw resources, raw VMs, or blobs, available to a developer.  I think they've done all of us a great service, and themselves.  Google's recent announcement, I think, is actually the inverse.  It's done a good service in terms of looking at an individual developer and saying: "Hey, for a specific problem, what is a very simple way of getting into this cloud game with a relatively constrained pattern and model, but doing it in a fairly slick, seamless way."   I think those are both interesting viewpoints and ultimately the answer that the broad developer audience wants will be a combination of those and many other things.  
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Good.  So in that context, we have just announced Live Mesh, and when I first saw it, I worried a little bit that people would see it in comparison to a lot of things which on the surface, it compares to. It can look like a FolderShare kind of thing, it can look like a screen-sharing kind of thing, it has those aspects. But in fact, this is one example of some platform-like capability for which those things are really trivial applications that have been layered on top. We &lt;i&gt;can&lt;/i&gt; talk about Live Mesh, so let's talk about it.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Live Mesh began with the perspective of saying, what is the environment that we are in today, and that we'll be in for the next who knows how many years. As users we're in a multi-device environment, and we need to cope with these many different devices.  Each one of us at home ends up being a kind of system integrator, if you want to get simple media sharing scenarios done between devices at home. You might have different contact-sharing things between your phone device and other things that you are dealing with on your web or on your PC.  If you're in a productivity context, you have document-sharing scenarios both among people and among devices.  
&lt;/p&gt;
&lt;p&gt;
Each one of use has had challenges, if we have multiple PCs and multiple devices, figuring out how to get the most recent version of that application installed on all the right devices that we're the system managers for. On the enterprise side, we've solved this quite well with things like SMS that lets an enterprise push things out to many desktops and manage desktops, but we haven't actually solved that problem for individuals.  It hasn't been a huge pain point for individuals, but now it's becoming more of a pain point.  
&lt;/p&gt;
&lt;p&gt;
That's one aspect that started us down the path of Live Mesh.  We basically said, the OS as it is right now, the OS for the phone, the OS for the desktop, the Xbox, the OS for a Zune, the OS for the PC, are all designed more or less to expose the resources of that device to developers and to users, but they are really not designed in concert with other devices.  What is going on in the web is mostly done serving the web, and the browser is largely disconnected from those devices.  If we were designing an operating environment for users or developers today, looking forward, it would probably look a little bit different. It would look like something that would bring those devices together for the end user. And so that is one thing that Live Mesh does.  It brings together your devices. You use the web as a hub to claim your device.  You securely identify yourself as an authorized user of this device.  Multiple people can own a device as authorized users and each person can have many devices.  
&lt;/p&gt;
&lt;p&gt;
Once you've said that's your device, it enables many things.  It enables centralized health monitoring and status reporting.  It enables settings replication across your devices in computers where you think appropriate.  It lets data flow among those devices, whether files and folders, or other things that I will talk about in a minute, like feeds. And it lets applications be configured and potentially licensed across your device mesh.
&lt;/p&gt;
&lt;p&gt;
And in solving the problem of getting things to work across your devices, the same kind of technologies can be used for multiple people.  So if you share a folder of documents, if you are working on a set of documents on your desktop with someone else, those same technologies that are used to synchronize that folder across devices can be used for me to share with you or other people.  So from the user's perspective, we think that Live Mesh can really transform your experience with multiple PCs and things like your phone to make the experience very seamless in that way. 
&lt;/p&gt;
&lt;p&gt;
Now let me just come from the developer's perspective, Live Mesh is actually a platform. What you see with Live Mesh when you download it is a very small piece, from the user's perspective, of what it actually is, because it was built to enable innovation in a variety of ways.  You can kind of think of what you see as the shell. If Windows or an OS has a broad sort of capabilities that is exposed by its APIs to developers, the shell, the command line of an OS or the Finder or the desktop within Windows is a thin exposure of that to users.  For Live Mesh, file and folder synchronization is that small amount that gives the user a taste for the capabilities of this platform.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  You've talked with me before about a couple of distinct application patterns. I think one of them is going to be intuitively obvious to people because the folder and file synchronization thing is something that people already do. So people are going to kind of get that if you drop a thing here, it shows up there, and hopefully they'll be delighted to find out that subtler kinds of things than whole files and folders can also participate in that synchronization. And they'll be interested to see how they can then bring people into the equation, with sharing. All that I think is what people will get at first glance. I think what will be less obvious is the way in which websites can use Live Mesh to optimize the communication of stuff down to individuals and groups of individuals. Since that's less obvious maybe we should take a moment to spell that out.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Sure. You can look at it from two perspectives.  Live Mesh is a way of enabling rich applications on the PC to get their settings and their data across devices, as you just said. But it's also a way for websites to be able to efficiently extend their function down to a world of devices. The PC for sure, but also phones and other devices. One of the things that we inadvertently stumbled upon in Groove was that enterprises wanted to use this technology to help them extend the functions of their websites out to a world of devices. That isn't what Groove was designed to do.  It was more designed as a peer sharing mechanism.  So one of the things that Live Mesh is all about is essentially, from day one, providing a centralized infrastructure such that this platform that's on all of the clients goes to this one service in the cloud to manage, all under the covers, all the synchronization. Now the actual data may flow peer-to-peer, it might flow relayed through the cloud encrypted, but one thing that is for certain is that an arbitrary web site won't have to deal with the complexities of synchronization. They can develop an application, using technologies that they are familiar with -- web development technologies -- and develop a piece of that application that gets downloaded to the client, that has local storage synchronized with the web site, they can update the application and the updates get  distributed transparently...
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Or maybe it's just data. 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: Yep, could be data.

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;
Let's talk about my bank's web site and my travel web site, two websites that I frequently do business with. In both cases there's data exchanged, and I would love for that data to be exchanged in a fully synchronized and reliable and transparent way.  What you're saying is that both of those web sites, and any other of web sites that I transact with, can pretty straightforwardly get into the game of plugging their pipes into my mesh.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  That's right.  They can plug their pipe into your mesh, and it's through mechanisms that most web developers these days are becoming increasingly familiar with, such as feeds. Some people might remember a few years ago there was a technology that we introduced, an extension to RSS called SSE (simple sharing extensions), that eventually matured -- with the help of the partner community -- into something we now refer to as FeedSync. It's essentially RSS and Atom extensions to a technology that was initially developed for publishing, where you have a list of items that get updated, and through a publish/subscribe mechanism, the updates get sent out. FeedSync extends that to make it bidirectional. You can essentially do crosswise subscriptions.  I subscribe to you and you subscribe to me on the same feed. We can both modify it, and make sense of the results, and understand how conflicts are dealt with.
&lt;/p&gt;
&lt;p&gt;
By using this very simple technology, we connect the web site to our cloud, our cloud to the clients, the clients to each other.  It is just a very simple thing. The base model of what is an application within the Live Mesh environment begins with essentially a feed of feeds.  One feed represents a logical thing that a site might be exchanging with the client. That's called a mesh object. It's a feed of feeds. A developer can new up one of these things, and its elements are other feeds. An application can develop as many feeds as it likes. Some of those feeds are hard-wired to be things like the list of members, or the list of devices in the feed, but then the application can develop many more.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: The way I'm thinking about it is that the sub-feeds are basically custom objects. If it's a calendaring application, those might be calendar events. In banking applications, they might be transactions.  But the notion is that the same infrastructure that's synchronizing files and folders can also synchronize these custom objects in the same way.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  That's exactly right. In what they see today, if you open up a folder, what we would refer to as a mesh-enabled folder -- it's one of these mesh objects.  And in essence every file within that folder is an element, it as an item within the feed.  The file itself is the enclosure. The metadata of the file -- its name, its modified date and so on -- are a standard schema that represents the item.  Then there's a news feed that you'll see on the right hand side if you open up one of these folders, that's another feed, and each of the entries in there is another item, and so on. 
&lt;/p&gt;
&lt;p&gt;
We expect that developers will develop feeds that suit the needs of their specific application, and we deal transparently with the synchronization of those elements. The user interface offers a very simple consistent way to help users manage conflicts -- if the application says that the user should be the one to deal those conflicts.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  There are a lot of interesting degrees of freedom here. My bank's website has a RESTful interface to this stuff, but so does my mesh client.  In fact, I think people will be surprised and delighted to discover that you can hit the localhost with REST calls -- and that's putting stuff in, as well as getting stuff out.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: Right. We made a decision, from an API perspective, that developers would prefer to learn one way of dealing with the mesh, and that the tooling would be easier if we had one way of dealing with the mesh. So the web version of Live Mesh, what's running up in the web, and what's running on the client, are symmetrical and the same code.  So on localhost, in a secure way, an application uses REST calls to invoke -- we call it MOE, the mesh operating environment -- or it calls cloud MOE to do what it needs to do.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  I think I can figure that out. [laughs]
&lt;/p&gt;
&lt;p&gt;
So the synchronization piece is interesting.  You've obviously been around this track a few times before. This time around, how is it different, how has it evolved from things you did in Notes or in Groove?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Well one thing that's different is that I didn't build this software.  I sponsored it, I had a good degree of input.  But a very talented team, with talented leadership, formed and rose to the task. I sponsored it, and there is certainly a DNA trail you can follow from Plato at University of Illinois to Lotus Notes to Groove and now into Live Mesh. And I was fortunate to have the opportunity to interact heavily with that team, when I had the time to do so, which was right after I came on board, when I was still in the CTO role.  But they took these base concepts and really ran with them, and developed it into a much richer thing than I could imagine.  
&lt;/p&gt;
&lt;p&gt;
But the DNA elements are the basic sync model, the basic interaction model. The biggest difference between Groove and Notes was that Groove embraced the concept of ad-hoc interaction much more in terms of inviting people into a shared environment.  So those invitation models are essentially borrowed from Groove into Live Mesh. So if you are a Groove user, you will feel very comfortable with that model in dealing with Live Mesh. 
&lt;/p&gt;
I hope people will be very pleasantly surprised with Live Mesh in terms of how it feels like there is almost nothing there.  It's very simple, even though it's complex under the hood, in order to actually accomplish this in a high-scale way and in a performant way and in a way that works across firewalls and home NATs and double NATs and things like that.  It's got very few knobs to turns and exposes itself in a fairly succinct way to the user.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  So, Mesh.com is the place to go to check it out but where do the developers find the SDK and everything they they need to know to actually work with it?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  We're bringing out the Live Mesh software right now because it's a preview, we need to begin getting user feedback, we need to begin testing the scale of the back-end.  You can architect and plan these things but you can't actually just light them up at hundreds of millions of users overnight.  So there's a progressive rollout that begins today. What you won't find on mesh.com is the developer kit. We're  beginning a series of systems design reviews with smaller sets -- but increasing sets -- of developers over the course of the summer. The official rollout of the dev platform, and broad availability of the dev platform, would be at our PDC, our professional developer conference, this fall.  So as a user, look at Live Mesh now. As a developer, stay tuned, look at the screencasts that we've done, they'll show what we can do from an application perspective, but really, come to the PDC, go to the PDC web site when it happens and play with it. Both from the perspective of extending a rich application to the web and to other devices, and also extending a website out to take advantage of the power of Windows.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: This is been extremely useful. Thanks, we appreciate it.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: It's been fun, thanks.
&lt;/p&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/Ray-Ozzie-introduces-Live-Mesh/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/Ray-Ozzie-introduces-Live-Mesh/</comments><link>http://perspectives.on10.net/blogs/jonudell/Ray-Ozzie-introduces-Live-Mesh/</link><pubDate>Wed, 23 Apr 2008 09:02:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma</guid><evnet:views>1602</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/Ray-Ozzie-introduces-Live-Mesh/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In this audio version of a &lt;a href="http://channel9.msdn.com/showpost.aspx?postid=399578"&gt;Channel 9 video&lt;/a&gt;, Ray Ozzie discusses his role as Microsoft's chief software architect, and the role of Live Mesh as one aspect of an emerging Internet-oriented platform.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.mp3" expression="full" duration="2190" fileSize="17439973" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma" expression="full" duration="2190" fileSize="17638541" type="audio/x-ms-wma" medium="audio" /></media:group><media:thumbnail url="http://perspectives.on10.net/Link/69d6e055-0380-4c15-b897-30d7213a79d5/" height="240" width="320" /><media:thumbnail url="http://perspectives.on10.net/Link/70e4b624-1012-42d1-949f-0766affe328c/" height="64" width="85" /><enclosure url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma" length="17638541" type="audio/x-ms-wma" /><dc:creator>Jon Udell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/Ray-Ozzie-introduces-Live-Mesh/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/22050/Trackback.aspx</trackback:ping><category>LiveMesh
</category></item><item><title>Word for scientific publishing</title><description>&lt;p&gt;Pablo Fernicola is a group manager at Microsoft.  He runs a project focused on delivering tools and services for scientific and technical publishing, with a particular interest on the  transition from print to electronic and web based content, and its implications for collaboration, search, and content discovery in the future.&lt;br /&gt;
&lt;br /&gt;
In this interview, Pablo explains how a new add-in for Word, now available as a technical preview, helps authors and publishers of scientific articles work more effectively with one another, and with online archives like PubMed Central. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;div&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.jpg" /&gt; &lt;/div&gt;
            &lt;div&gt;&lt;b&gt;Pablo Fernicola&lt;/b&gt;&lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/mscorp/tc/scholarly-publishing.mspx"&gt;Technical Computing @ Microsoft - Scholarly Publishing&lt;/a&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=09C55527-0759-4D6D-AE02-51E90131997E&amp;displaylang=en"&gt;Download details for the Article Authoring Add-in&lt;/a&gt; &lt;/p&gt;
            &lt;p&gt;Pablo Fernicola's blog: &lt;a href="http://blogs.msdn.com/exscientia/"&gt;ex Scientia&lt;/a&gt;&lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Hi Pablo, thanks for joining us to talk about a new Word add-in for authors of scientific journal articles. It's an interesting story about applying the XML capabilities of Office, and also about the evolution of journal publishing. How did this project get started? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It's an incubation project. Three people had an idea: Jean Paoli, an XML pioneer, Jim Gray... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Oh really? I didn't know he had been involved. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, he and Jean really pushed to get this started, and they both recruited me for this project. It's been a little over a year since Jim disappeared, and that was a big blow, considering his key role. &lt;/p&gt;
&lt;p&gt;And third key person is Tony Hey. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; We should explain that Tony runs what's called the technical computing initiative, and is very involved in figuring out how Microsoft can help various people in the scientific community address computing and information management challenges. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. Scientific authors in many disciplines use Word to write articles. We looked into how to simplify the workflow, streamline the process, and lower the cost. And not just for the authors, but also for the journal publishers. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; It's been true for a long time in publishing, and not just scientific publishing, that there have been real challenges getting that Word content converted into the kinds of long-term formats we need: XML that's richly decorated with metadata. &lt;/p&gt;
&lt;p&gt;Publishers have tended to use strategies that involve giving people templates that try to use styles to control what's in the document. But since Word 2003, and especially since Word 2007, there have been a set of XML capabilities which have made possible a much more robust approach. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. Before Word 2003, styles were the best you could do. And people got quite far by relying on them. But they were very fragile. When you copied and pasted, styles would bleed across. It was hard to disentangle that when you converted the file. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; That's part of the problem. And part of is that, along with the content itself, there's a process involving the metadata, and that process is divided between the author and the journal publisher. It's a shared responsibility, and you need an information management system that embraces that division of labor. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Also: What kind of user interface do you present to these different groups? There are really three groups. First the authors, who are subject-matter experts but don't know anything about the publishing process, and shouldn't have to know. &lt;/p&gt;
&lt;p&gt;Second, the journal editors. They're also subject-matter experts, but they also know about the structure of the journal, and about the metadata they need to apply &lt;/p&gt;
&lt;p&gt;And third, you have companies and vendors who do backend tools and services, as well as the folks who work on the electronic archives. With the move from print to electronic journals, the role of the archive becomes very significant. Either the journals have their own repositories, or you have centralized repositories at university libraries or larger institutions, for example the National Library of Medicine with PubMed Central, or Cornell with Arxiv.org. &lt;/p&gt;
&lt;p&gt;That group is very technical in terms of understanding file formats, elements, and properties. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; But even those folks shouldn't necessarily need to master all of that. They'd rather spend their time on math and physics, not the minutia of XML publishing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. The way the pipeline is set up today, you start with a Word document, and then at a certain point you convert to XML, and from that point on, all the editing happens in an XML editor. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So in biology and medicine, the format defined by the National Library of Medicine, and the one you're supporting in this Word add-in, is called the NLM DTD. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes. It's not only used by PubMed Central, but also a lot of the commercial publishers are using it for their archival format. And we're also seeing it used by publishers in other disciplines, for example law and social science. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Really? It's general enough for that? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It is fairly general, and I'm really impressed by how the community related to scientific, technical, and medical publishing is not reinventing the wheel, but instead leveraging something that's in common use. &lt;/p&gt;
&lt;p&gt;A significant point is that the format usually does not encode any presentation elements. It's all about the semantics and the metadata, not about what font or background color. As you try to preserve data for the long term, for centuries from now, the presentation is not relevant, it's the content that matters. You can always generate a presentation from it. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So as we see in the accompanying screencast, you've created an add-in that presents editing enhancements both for authors and for editors. The interface for the author helps that person fill in the template and also apply those metadata elements which are appropriate for the author to apply. Then there's a separate interface for the editor. Explain a bit about how this can change the workflow. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; If you start from the author side, a key premise was requiring less effort to produce a valid document. You want to avoid having the author round-trip with the editor, back and forth, because they didn't fill in all the required information. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; And that happens a lot? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes. And it's not just failure to provide the required information. We want to make it easier to provide the correct information. Consider co-authors. You'll likely work with the same ones over and over. You want to avoid having to repetitively enter that information, and avoid having errors creep in. Remember: As we move to electronic publishing, search becomes key. It's the main way people will find articles. To have good search results, you need to know the information in the articles is good. If the last name of the author is misspelled, it's harder to find all the papers from that author. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; In terms of the consistency of author information, you can help with this Word add-in by normalizing the metadata editing process, but there still has to be a reliable disambiguated set of author names which are managed by the publishers, and ideally by a federation of publishers, and ultimately even more broadly than that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Correct. If we look down the road, we see something like a global directory, but we're not there yet. We have to build up to that. When you look at the add-in, we're taking small steps that will get us to at least a better baseline than we have today. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Or, given that the world is moving to that baseline anyway, will help make it quicker and easier to get there. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. If we think of the authors, the key thing is to provide a very simple interface. As we consider features, if they look complicated we'll drop them. One of the prevailing rules is: Don't duplicate Word UI. If there's a way to do tables or equations or reference lists in the Word UI, we'll use those. We don't want to provide a lot of new UI for the authors to learn. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What I find interesting, here, from a workflow perspective, is how people in different roles are touching different pieces of data and metadata. Historically that's been a one-way process. Once the article is converted into the NLM format, it's typically not available to go back to the author for editing in the original context. So the person at the journal has to be responsible for round-tripping change requests. &lt;/p&gt;
&lt;p&gt;Similarly with the editing of the metadata. The author might want to make some changes, the journal publisher might want to make some changes, and those things tend to happen in disparate environments. What this is showing is what has always been the promise of robust XML editing on the desktop. You can bring all these chores into a common environment. The unit of workflow, the document, is something that can flow to different people in different contexts, and be modified in different ways, but it hangs together as it moves through the process. &lt;/p&gt;
&lt;p&gt;That's a big deal, and it goes far beyond the specific domain of scientific and technical publishing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. And in addition to keeping all the data together and providing a simple interface, publishers have told us that as they move to electronic-first, they expect the cycle times to shrink. With the current disconnected tools and formats, that's hard to achieve. If you want to make a quick revision and send it to the journal, it may be too late because they've started the process of conversion, and once that starts there's no stopping it. &lt;/p&gt;
&lt;p&gt;And to your point about other domains, people have told us they want to use this for things like grant requests as well, moving away from article content to other kinds of content that can benefit from the structure and validation. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; The problem is almost universal. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, anytime you want to validate content, or preserve it for a long time, these capabilities are relevant. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So 2003 was the first major deployment of XML capability for Office and for Word. We haven't yet seen as much use of that capability as I'd expected. Why? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; The biggest challenge was that XML wasn't the default format. You had to have authors do special things to produce XML. Also, if you think of the NLM formats, they contain things that aren't part of normal Word content or UI. In Word 2003, extending the document content, or extending the UI, wasn't as easy as it has become in Word 2007. &lt;/p&gt;
&lt;p&gt;With Word 2007, you end up with a set of things, in a single installation, that bring all the enabling capabilities together at the same time and in the same place. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So what you did have, in Word 2003, was user-defined schema, but you're saying that wasn't enough, and that the newer capability of including arbitrary chunks of XML is more flexible for this purpose? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yeah. There's two parts to that. There's content within the document, so the ability to have new XML elements that are part of the document, and that's more robust and expressive in Word 2007's Open XML format. Then there's the ability to have other XML data packaged within the file. Custom XML is what that's usually called. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; And that's the method you're using for the journal metadata? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. And since this is all defined as part of the Open XML format, and since the packaging of the file follows the standard as well, developers can build their own tools to create metadata, access metadata, or even create the whole file, they can. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So this is a first cut you're putting out for publishers to experiment with, and to help you refine the templates they'll deploy to authors? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, this is a technology preview for evaluation and feedback. The idea is that the publishers will create the templates themselves. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Who are you working with? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; We're talking to many different journals, publishers, and archives. Each constituency has a different set of interests and requirements. Journal editors care a lot about the templates, but folks at PubMed Central and Arxiv care more about how the metadata gets validated. &lt;/p&gt;
&lt;p&gt;We expect a beta shortly, and a 1.0 release by late summer. It'll be a free add-in for Word. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Well thanks Pablo. I fear that this will only seem interesting to the relatively small number of folks who have a direct interest in scientific, technical, and medical publishing. But I hope it will be apparent that it's much broader. You hinted at that when you mentioned that the NLM format, despite having been invented for the particular purposes of certain disciplines, is being taken up by people in legal and other disciplines. &lt;/p&gt;
&lt;p&gt;I'm excited about it because I care about publishing and metadata and robust information systems and open formats, and this brings all those things together. I'm glad to know that it's happening, and I'm glad you're working on it. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It's really proving the value proposition of XML, and show how it's coming of age in a mainstream production environment. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Yep. For those of us who've been thinking about this for a long time, there's been a tendency to get frustrated and feel like it'll never happen. But it just takes a while for things like this to make their way into the mainstream, and this is a great example of that. &lt;/p&gt;
&lt;p&gt;Well, thanks Pablo! &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; OK, thanks! &lt;/p&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/</comments><link>http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/</link><pubDate>Thu, 17 Apr 2008 15:30:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma</guid><evnet:views>505</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;br /&gt;
Pablo Fernicola is a group manager at Microsoft.  He runs a project focused on delivering tools and services for scientific and technical publishing, with a particular interest on the  transition from print to electronic and web based content, and its implications for collaboration, search, and content discovery in the future.&lt;br /&gt;
&lt;br /&gt;
In this interview he explains how a new Word add-in, now available as a technical preview, helps authors and publishers of scientific articles work more effectively with one another, and with online archives like PubMed Central.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.mp3" expression="full" duration="1780" fileSize="14245440" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma" expression="full" duration="1780" fileSize="14420011" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma" length="14420011" type="audio/x-ms-wma" /><dc:creator>Jon Udell</dc:creator><slash:comments>3</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/21987/Trackback.aspx</trackback:ping><category>office xml</category><category>publishing</category><category>science</category><category>Word</category></item><item><title>Pablo Fernicola demonstrates the Word add-in for scientific authors</title><description>&lt;img src="http://perspectives.on10.net/Link/6ac56c1b-026a-49f3-9108-fbfd6ee5da7a/" border="0" /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In this screencast, Pablo Fernicola demonstrates the technical preview of a new scientific publishing add-in for Word. The add-in enables reading and writing of XML-based documents in the archival format used by the National Library of Medicine. &lt;br /&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</comments><link>http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</link><pubDate>Thu, 17 Apr 2008 15:29:00 GMT</pubDate><guid isPermaLink="true">http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</guid><evnet:views>189</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>







In this screencast, Pablo Fernicola demonstrates the technical preview of a new scientific publishing add-in for Word. The add-in enables reading and writing of XML-based documents in the archival format used by the National Library of Medicine. </evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wmv" expression="full" duration="618" fileSize="8537475" type="video/x-ms-wmv" medium="video" /><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wmv" expression="full" duration="618" fileSize="8537475" type="video/x-ms-wmv" medium="video" /></media:group><media:thumbnail url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/screencast.jpg" height="240" width="320" /><media:thumbnail url="http://perspectives.on10.net/Link/6ac56c1b-026a-49f3-9108-fbfd6ee5da7a/" height="64" width="85" /><dc:creator>Jon Udell</dc:creator><slash:comments>1</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/21986/Trackback.aspx</trackback:ping><category>office xml</category><category>publishing</category><category>science</category><category>Word</category></item><item><title>Making sense of C02 data: A Microsoft/Berkeley collaboration</title><description>&lt;p&gt;&lt;i&gt;
In this podcast, MSR researcher Catharine van Ingen and Berkeley micrometeorologist Dennis Baldocchi talk with Jon Udell about their collaboration on &lt;a href="http://fluxdata.org"&gt;www.fluxdata.org&lt;/a&gt;, a SharePoint portal to a scientific data server. The server contains carbon-dioxide flux data gathered from a worldwide network of sensors, and provides SQL Server data cubes that help scientists collaboratively make sense of the data.
&lt;/i&gt;
&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/baldocchi.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Dennis Baldocchi&lt;/b&gt; is a professor of biometeorology at Berkeley. His research focuses on the physical, biological, and chemical processes that control trace gas and energy exchange between vegetation and the atmosphere. He also studies the micrometeorology of plant canopies.
&lt;/div&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;hr /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;div&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/catharine-van-ingen.jpg"&gt;
&lt;b&gt;Catharine van Ingen&lt;/b&gt;, a partner architect with with Microsoft Research in San Francisco, does e-science research exploring how database technologies can help change collaborative research in the earth sciences. She collaborates with carbon climate researchers and hydrologists. 
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;hr /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://www.fluxdata.org"&gt;Fluxnet website&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href="http://www.microsoft.com/mscorp/tc/carbon-climate-feature.mspx"&gt;MSR news article&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Dennis, you're someone who's pulling together a worldwide network of CO2 monitoring stations. Can you briefly explain how these devices work?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Sure. Let me give you a bit of history. Back in the late 1950s, &lt;a href="http://en.wikipedia.org/wiki/Charles_David_Keeling"&gt;David Keeling&lt;/a&gt; made some of the first measurements of carbon dioxide concentration -- on Mauna Loa in Hawaii, in the Arctic, very remote locations. They saw an increase in the C02 concentration in winter, and a decrease in summer. The increase is due to respiration in the biosphere, the decrease is due to photosynthesis. And on top of this they saw a trend due to fossil fuel combustion and logging of tropical forests. 
&lt;/p&gt;
&lt;p&gt;
These measurements were just C02 concentrations. As atmospheric scientists, we know that changes in the atmospheric concentration are due to fluxes. We measure actual fluxes: moles of carbon dioxide, per meter squared, per second, between the atmosphere and the biosphere.
&lt;/p&gt;
&lt;p&gt;
We do it with a combination of sensors. One is a three-dimensional sonic anemometer, which measures up-and-down and lateral-and-longitudinal motions of the air, ten times a second. And simultaneously with new sensors we measure instantaneous change in CO2 concentration.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So it's a combination of sensing wind speed and sensing atmospheric gas.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Absolutely. We measure a covariance between the two, and theoretically that's related to the flux density.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; And this population of sensors has been growing for 15 or more years?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, my old lab in Oak Ridge, Tennessee made some of the first sensors we were using in the early 90s. Around then a company called &lt;a href="http://www.licor.com/"&gt;Licor&lt;/a&gt; started making a sensor that's about 15 centimeters long and shoots an infrared beam from source to detector. The air can blow through this sensor, and it's low power, doesn't need pumps, so it can be deployed in the middle of nowhere. Many of us run with solar power, so we have a PC that pulls an amp, then the sensor pulls another amp, so for two amps we can run a flux system.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; As Catharine points out, there's a long tradition of large-scale collaboration in some scientific disciplines, but it's relatively new in other areas, and it sounds like this is one of those.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah. I was a grad student in the 80s and I remember my professor having a desk full of data. People would knock on the the door wanting to borrow it, and there was always some reluctance, it was really a single-investigator culture at the time.
&lt;/p&gt;
&lt;p&gt;
In many ways I credit our Italian colleagues, they were really gregarious and good at hosting wonderful workshops that started bringing people together. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So Catharine, how did Microsoft get involved in building out the scientific data server that supports this project?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It was serendipity. We had met folks at the &lt;a href="http://www-esd.lbl.gov/BWC/"&gt;Berkeley Water Center&lt;/a&gt; two ways. First through Jim Gray's interest in e-science and database applications. Second, one of the current heads of the Berkeley Water Center is an old friend of mine from grad school, Jim Hunt. We were talking about doing a hydrology project, then somehow my colleague at BWC on the computing side, Deb Agarwal, ran into Dennis, and we started talking.
&lt;/p&gt;
&lt;p&gt;
Dennis fit all of the criteria for how I like to engage with scientists. He was desperate, he had a problem that he didn't know how to solve, and that was important, because it meant he was willing to talk to us and teach us things.
&lt;/p&gt;
&lt;p&gt;
Also he had enough data to make things interesting for us. It's not petabytes, but we're talking about the hundred-gigabyte range, and the dataset is extremely diverse. I find it fascinating from an informatics point of view because it's a true scientific mashup to do the data analysis. You're taking the flux data that Dennis just described, as well as a lot of site properties, and other things from the literature, and trying to bring it all together.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; There's a whole range of what you folks call ancillary data, which describes soil and vegetation and other aspects of the environment.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; To give you an example, the meteorological data, from a database point of view, is fairly simple and regular. Our loggers give us half-hour data, so you get what's essentially an Excel spreadsheet. The rows are timestamped for each half-hour, and the columns are temperature, flux of water, solar energy, and so on. But it gets complex when you weave in the ancillary data. For example, you need to know the population of leaves that control these fluxes. You may measure that in a half-dozen spots, a half-dozen times per year. Then you need to understand leaf photosythesis, and that's another set of measurements, and then soil texture, carbon, and water absorption, and all these measurements are at different depths, different times, it gets really complex.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Another interesting aspect, from our side, is handling time. We all think time is linear...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; [laughs] Not according to Einstein...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; [laughs] ... well ... so, since we're dealing with plant information, plants photosynthesize during the day. So rather than using wall-clock time, using the plants to tell us about day or night was really fascinating. In effect we're deriving a time window based on the time series data themselves, and for informatics folks, this was more fun than a barrel of monkeys. We've generalized the concept now, and applied it to a couple of other disciplines. Handling time has turned out to be one of the biggest areas of learning.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So what is FluxNet, actually, and how does the data get into the scientific server that you've built?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; It started at a workshop we held in Italy in 1995. From that, regional networks started blossoming. First off the ground was the EuroFlux network, then AmeriFlux in about 1997, then over time the Asians, the Canadians. NASA funded us for two cycles, and then things dried up as they decided to go to the moon and to Mars. Most recently we've been funded by NSF, which is funding a whole bunch of ecological networks. On the side, there's been funding to Oak Ridge National Lab, through NASA, to maintain the data acquisition and archive system. And then Deb and Catharine joined in to build value-added products through this FluxData project. 
&lt;/p&gt;
&lt;p&gt;
Sometimes I think we're like Tom Sawyer, we've got this fence to paint and all these people are helping us paint it.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Or like stone soup.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It is like stone soup. From an informatics point of view, the way we think about it is that the data starts with tower owners -- and Dennis is a tower owner as well as a project overseer -- and flows to one of the network repositories, or directly to Oak Ridge where the data is stored.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; OK, so your site, www.fluxdata.org, is not the repository, it's for analysis...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Yes. There are data archive centers, funded primarily by NASA, where you can contribute data, and where data is stored. The challenge for the scientist is to get from the raw data to the science, it's a classic last-mile problem. So the data flows from the repositories to the folks in Europe who are doing gap-filling and uniform processing, and it flows back to Oak Ridge for long-term storage, and it flows to us.
&lt;/p&gt;
&lt;p&gt;
We then make it available to researchers to download, and we provide the value-added summary products. So we're not at the front end gathering data, and we're not the archive, we're in the middle, solving that last-mile problem.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Part of that solution is to put the stuff into data cubes. Dennis wrote somewhere that while these have been used in financial analysis for a long time, their application to scientific analysis is new. It might surprise some people to learn that this way of looking at data isn't common in the scientific world.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It actually isn't. OLAP databases, data cubes, have been around for a long time. I think I first saw one in the early 90s. But that was really commercial data, it was about finding how to make coupons for Oreos and milk. Scientific data is different in a couple of respects. First, it's much more dense. You tend not to always buy Oreos and milk together, but Dennis always reports CO2 flux, temperature, and precipitation together. The other difference is that a lot of the analysis for commercial data is not at the leaf nodes, it's about annual sales. Whereas a lot of science is actually at the leaf nodes, it's about looking at statistical variation in the half-hourly data.
&lt;/p&gt;
&lt;p&gt;
So we end up building different-shaped cubes. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; And let me add that we'll present this data with gaps, for several reasons. One is that if there's a thunderstorm, it might cause the instrument to malfunction. Another is that we have to comply with meteorological steady conditions -- for example, steady winds. So we apply a lot of quality assurance to the data set, and that produces gaps, but any user of the data wants a continuous record. So we need to find ways to fill those gaps. 
&lt;/p&gt;
&lt;p&gt;
We also want to partition the fluxes, so we can understand mechanisms. We measure the net ecosystem exchange, but there's a component due to photosynthesis and a component due to respiration. By separating out day and night data we can derive these components, so there's all this value added to the data from the archive.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So I looked at some of your pivot tables, for example on sites by vegetation -- how are those being used?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; To do cross-site analysis. For example, we're interested in how length of growing season may affect net carbon exchange. When I did this analysis before I met Catharine, I had to open a bunch of spreadsheets and cut and paste, cut and paste. With the cubes, you press a button and the data's there. It really allows you to do a lot of quick what-if questions, and be creative. It makes our work quicker and easier.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; We're also doing a fair amount of sorting. You can sort along vegetation types, to see the difference between croplands and grasslands. We also know each of the sites that is a boreal forest, so you can look at just those, or just tropical forests. If the database has 900 site-years, you can select just the 200 that you need for a piece of analysis.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Is it fair to say that until this was brought together it wasn't possible to do this?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It was possible, but just really tedious.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Back when the network was small, we did a workshop in 2000, and we had about 100 site-years of data from 30 sites. It was easy to be clunky. But now we have 900 site-years from 400 sites, and you just can't use the old methods. We have to go modern. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; What kinds of collaboration effects are you seeing? You've written that it's a big challenge to motivate scientists to contribute the ancillary data in a standard way. Getting the stuff in front of people like this, in a common presentation with explanations about what all the variables mean, and how to report them, should help get everybody onto the same page.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; I see a couple of things. First, we're starting to hear from individual tower owners asking us questions, and telling us what's wrong. "I'm sorry, my site isn't really at that lat/lon." Or: "My leaf index is really this."
&lt;/p&gt;
&lt;p&gt;
They see their data being used in papers: we're hosting access for about 60 paper-writing teams. As the papers come to fruition, we're actually tracking what sites they're using, so it's possible to go in and find out who's using your data.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; It's motivating. I know my post-doc is so excited when she finds out people are using this data. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; That explains why you have an update feature on the site?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Absolutely. We know there are corrections that need to be made. Treating it as a living, breathing data set, and being able to respond in an organized way to changes...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; As more eyes look at it, they can help us fix it. Especially our own data. You look at it and don't see the problem, but when someone tries to use it...oops. In fact we found a problem with our solar heat flux recently. We were doing the correct calculations from 2000 to 2003, then we changed algorithms, and the staff changed, and all of a sudden there was a glitch in how the data were being processed. Finally some scientist from UCLA wanted to use the data, and he plotted it up, and found the problem. So now we're correcting that. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; One of the things that happens when you plot data over time is that you can see any errors in time reporting. One site was off by a couple of months. The data looked fine when you plotted just that site. But if you plot it by nearby sites, suddenly you see the problem. That's the kind of processing -- bringing the data into focus -- that we're engaged in right now.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So you've got the data online, and tools for viewing and updating the data, but there's also a conversational infrastructure. You have a blog, there are places for people to add comments and have discussions, and all of that is kept together with the data. Catharine, you've said that the role of data curation in science is emerging, and will be key as we increasingly see these mega collaborations with hundreds or even thousands of people working on the same data. You need an environment in which those conversations can be centralized in the same way the data is centralized.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; There's also almost a traffic-cop role too, just to avoid redundant efforts. There are several obvious ideas, and multiple groups may want to pursue them. In the long run it's a waste of effort if people are doing the same redundant analysis, and only one paper may get published. If we can get these people to talk to each other, and interact, that's critical.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; As Catharine puts it, investing the same effort in publishing data as you would in writing a paper is something that's not yet socialized. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; No, it's not. We see again and again how difficult it is to put the data in a box and tie a bow around it, so people can reuse it. It's very hard, but very important, long-term, for a lot of these environmental problems.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; So Catharine, by marking these data sets and giving them some kind of provenance, is this a way scientists can get credit for the work?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Well, the challenge isn't only enabling that, but also teaching the funding agencies that it's just as important.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Exactly. I've talked to Timo Hannay about this -- he's the guy who runs the web stuff for Nature Publishing -- and this is a huge interest of his. Science is an enterprise that runs on people getting credit for publishing papers, not data. I gather that often papers are published as a thin gloss on a data set, just to get the data out there. There hasn't been a model for publishing the data itself. The fact that the data from somebody's individual tower can be traced back, and then traced through its use in follow-on papers -- that's huge. Your post-doc can not only get excited about other people using her data, she can get credit for their citations of it.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So, the climate effect of C02 is obviously a hot topic. What have we actually learned at this point?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; One paper used this network in combination with remote sensing to see how carbon exchange across Europe responded to the drought and heat wave in 2003. So here was this network poised to measure how the whole biosphere responded to this climate assault. 
&lt;/p&gt;
&lt;p&gt;
The network has also been successful with what we call emergent scale processes. One that came out strongly is that plant canopies respond to light more efficiently if the light is diffuse, as opposed to when there are clear skies. That's a process we haven't seen before.
&lt;/p&gt;
&lt;p&gt;
Another thing we found, because we have continuous records, is that if there's a summer rain event, microbes turn on immediately and produce huge amounts of respiration that we never envisioned before. Scientists in the past would miss these extreme events, but by having continuous measurements we can see how the system responds. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; But you wouldn't argue for long-term trends in the 15 or so years of data you've collected?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; If there are long-term trends, they seem more related to ecosystem dynamics. Many of the forests under study were disturbed at the turn of the century, so they're going through that natural cycle of growth, maturity, and decay. Those ecological features lay on top of any potential climate trends.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So it's more about having an infrastructure in place that allows us to have the data in hand, and then make some predictions?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yes. Now in fact, one of the things we are seeing is a change in the length of the growing season. As things have gotten warmer, the spring comes earlier, and it's really affecting carbon uptake in the citrus forests. But the big unknown is that if you have an earlier spring you might also get a summer drought, so you have an increase in carbon in the spring, and  a decrease in summer, and the two factors may cancel out. But with our measurements we can see the mechanisms, we can understand and parse out what's happening and why. Whereas in the past, scientists would cut down trees and get tree rings and take one integrated snapshot for the whole year. But they wouldn't understand why, because those tree rings were also affected by drought and temperature and ozone and elevated C02 and other issues.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It's really a great time to be doing this stuff, because you're at the juxtaposition of social need, scientific need, and the availability of cheap technology.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; And our NSF grant encourages to do outreach, so this is a great opportunity to do that.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Jim Gray always used to point out that the post-docs are the ones in any collaboration who most embrace new technology, and move the entire collaboration forward. Knowing the guys over in Europe that's certainly true, and you can see it happening with your own post-docs, Dennis.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So how are these cubes getting built, Catharine? What was the collaboration between you and the scientists?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; We're lucky to be starting with a data set that is very well processed. As to building the rest, Dennis gave us, gosh, I looked at 300 hundred of his graphs. I also got a similar collection from two of his other colleagues. I went through all the graphs and papers to try to understand how the data is manipulated and displayed. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; That's a good idea. I didn't realize you did that.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Oh yeah. [laughs]
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; That would be helpful, because you see the kinds of products we're trying to create from these databases.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Absolutely. I started by classifying the graphs into time-series graphs, scatterplots, and then everything else. Then I waded through how everything was sorted, searched, filtered, trying to figure out how to organize the data to enable that class of graphs.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB&lt;/b&gt; So Catharine, there are a bunch of graphs I'd like to replot with this new database.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Well Dennis, you and I should have lunch and we should figure out how to rip out a bunch of graphs. 
&lt;/p&gt;
&lt;p&gt;
So, along the way we realized that scientists will often make 50 graphs, through away 48, and keep two. The ability to make a lot of graphs rapidly and simply usually requires some kind of scripting, and that's where you start leaving Excel and going into MATLAB or another scientific analysis tool.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, I'm using MATLAB a lot nowadays, and I'm seeing things I never saw before. I like having the script files because it gives me some history of what I was looking at.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; That's why we decided to connect MATLAB to the cube, so you can browse the reports we make in Excel, or go directly through MATLAB. Again, it's solving that last-mile gap to the scientist's house.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Well this has been great, thanks!
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, thanks. Catharine, we should get together and talk about some graphs.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Thanks Jon. And thanks Dennis. Are you in your office? I'll call you later this afternoon.
&lt;/p&gt;&lt;img src="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/</comments><link>http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/</link><pubDate>Thu, 03 Apr 2008 15:26:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma</guid><evnet:views>628</evnet:views><evnet:viewtrackingurl>http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;&lt;em&gt;In this podcast, MSR researcher Catharine van Ingen and Berkeley micrometeorologist Dennis Baldocchi talk with Jon Udell about their collaboration on &lt;a href="http://fluxdata.org"&gt;www.fluxdata.org&lt;/a&gt;, a SharePoint portal to a scientific data server. The server contains carbon-dioxide flux data gathered from a worldwide network of sensors, and provides SQL Server data cubes that help scientists collaboratively make sense of the data. &lt;/em&gt;&lt;/p&gt;
&lt;em&gt;&lt;/em&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.mp3" expression="full" duration="2400" fileSize="19524480" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma" expression="full" duration="2400" fileSize="19750759" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma" length="19750759" type="audio/x-ms-wma" /><dc:creator>Jon Udell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/RSS/</wfw:commentRss><trackback:ping>http://perspectives.on10.net/blogs/jonudell/21860/Trackback.aspx</trackback:ping><category>collaboration</category><category>data curation</category><category>science</category></item><item><title>Cluster computing for the classroom</title><description>&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img width="280" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/kyril_faenov.jpg" /&gt; &lt;br /&gt;
            &lt;b&gt;Kyril Faenov&lt;/b&gt; is the General Manager of the Windows HPC product unit. Before founding the HPC team in 2004, Kyril worked on a broad set of projects across Microsoft, including running the planning process for Windows Server 2008, co-founding a distributed systems project in the office of the CTO, and developing scale-out technology in Windows 2000. Kyril joined Microsoft in 1998 as the result of acquisition of Valence Research, an Internet server clustering startup he co-founded and grew to profitability by securing MSN, Microsoft.com and some of the world's other largest web sites as its clients. &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;img width="280" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/rich_ciapala.jpg" /&gt; &lt;br /&gt;
            &lt;b&gt;Rich Ciapala&lt;/b&gt; is a program manager in Microsoft HPC++ Labs, an incubation team within the Windows HPC Server product unit. Rich joined Microsoft in 1992 and has held a number of different positions in technical sales, Microsoft Consulting Services, the Windows Customer Advisory team and the Visual Studio product team. &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
            &lt;div&gt;&lt;a href="http://labs.microsofthpc.net/compfin"&gt;Microsoft HPC++ CompFin Lab&lt;/a&gt;&lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;h2&gt;Kyril Faenov and Rich Ciapala discuss a new HPC++ Labs project that enables students to run computation-intensive experiments involving large amounts of financial data. &lt;/h2&gt;
&lt;br /&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What Rich just demoed, which we'll show in a screencast, is how a financial model can be deployed to a server that acts as a front-end to a compute cluster. It's a nice easy way for students to use a model developed by a professor, select a basket of securities, run a very intensive computation on them against large chunks of data, and get answers back in an Excel spreadsheet. The bottom line is that the students can run an experiment using a level of computing power that was never before so easily accessible. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Yeah, because of the complexity involved in deploying systems like that, acquiring the data, and curating it, a lot of universities don't have this kind of infrastructure in place. So for a number of students who haven't done this before, this will make it available for the first time. For others who have, it will make it quite a bit easier. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Now these are not computer science students who are learning about high performance computing, and about writing programs for parallel machines, these are students who are learning about financial modeling, and this just makes a tool available to them that can accelerate that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Precisely. Most of our HPC customers are scientists, or engineers, or business analysts, not computer scientists. They're folks who use mathematics, statistics, differential equations ... sometimes not even math directly, but applications that encode these mathematical models to do research, or engineering, or risk modeling, or decision making. To them it's just a tool, and they want to use it in the way they use PCs today, as transparently and straightforwardly as possible. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What's the situation today for most people? In the case of the covariance model Rich showed in the demo, if it weren't being done like that, how would it be done?a &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; You can do it in Excel, or MATLAB, or SAS, on the workstation. So you'd acquire the data, and use your preferred tool ... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; ... and wait a long time ... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; ... and wait a long time. And if you want to do a significant amount of data -- like a year's worth, for a large number of stocks -- it might not even be possible at all. &lt;/p&gt;
&lt;p&gt;Or you might load it up into a server, but then you have to figure out how to write an application, how to deploy it out to the server, then figure out how to submit the data to the model, pull it back, integrate into the visual analytic process. &lt;/p&gt;
&lt;p&gt;This multi-step process is exactly what our HPC customers are running into. They're expressing the models and doing the design on the workstation, using any number of tools. They do the analysis of the results, and visualization, on the workstation. But large-scale computation runs somewhere else. It might be in their organization, it might be out on the Internet, but it's a very disjointed process. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; There are clusters out there in academia, and there are people doing these kinds of things, but the point is that hasn't been woven together yet. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; That's right. In 2004 the U.S. government published an assessment of U.S. competitiveness in high performance computing. The first recommendation was, and I'm quoting: &lt;/p&gt;
&lt;blockquote&gt;Make high performance computing easier to use. Emphasis should be placed on time to solution, the major metric of value. A common software environment that spans desktop to high-end systems will enhance productivity gains. &lt;/blockquote&gt;
&lt;p&gt;That's what we're starting to see in the HPC community. Not just getting the systems running as fast as possible, but figuring out how the workflow, the creative element of the scientific process, can be optimized. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So, Rich and I talked about the particular model used in his demo is in a class called &lt;i&gt;parameter sweep&lt;/i&gt;, which he distinguishes from the more distributed and chatty kinds of applications. In this case, you can send a batch of data down to a node, it can think about it for a while then give back an answer, and there doesn't need to be much communication. Is that the optimal scenario for this architecture? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Actually, it's optimized for a broad range of HPC applications. In fact, the major goal of the first release of the product, Compute Cluster 2003, was MPI-style [message passing interface] applications. There are a lot of these in engineering and in the environmental space. You're modeling some kind of physical process, and you build a mesh or grid that takes a large physical process or body, partitions it, does computations on local areas, but then has to frequently exchange data across the partitions. Think about a car crash simulation. You might partition the hood of the car into a lot of pieces, every one computed separately, but as the deformation is happening the forces need to be exchanged. Or weather modeling, where heat exchange happens across partitions. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; There's a high degree of data interdependence. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Exactly. When you you have an interdependent problem, you use MPI for that. We worked with the team at Argonne National Labs that releases the open source reference implementation of MPI, and we've adopted that in our product, optimized the performance and security on Wi