How do I fix "Error 503" when accessing the DataFacet taxonomy Service Application?

[gallery link="file" columns="5"]

How do I fix "Error 503" when accessing the DataFacet taxonomy Service Application?

he "Error 503" refers to the fact that the DataFacet web service application is not available for some reason. The most common reason is that the application pool associated with the web service application cannot be started because of invalid credentials. If you installed DataFacet under a specific account and then later changed the password or privileges of that account, then the DataFacet application service will not start because it's application pool will not start. 1) Make note of the time. If you can afford the luxury, you may want to clear out the system and application event logs to make it easier to spot error messages - but it is not necessary at all. 2) Run IISRESET to try to restart all of the SharePoint application services. 3) After IISRESET has run, make sure that SharePoint Central Administration and the Managed Metadata service are both up and running by visiting them in the browser. 4) Click on the DataFacet service application link. If the page shows, you probably just needed a remedial IISRESET and you are good to go. 5) If you get the 503 error again, go to the event log and look for a succession of two warnings and an error from the WAS Source: Screen Shot The error will be something like "Application pool xxxxxxxxxxxxxxxxxxxxxxxx has been disabled. Windows Process Activation Service (WAS) encountered a failure when it started a worker process to serve the application pool" The two warnings will hint at why the application pool failed to start - probably having to do with authentication. Screen Shot - 2 6) Now that you know the identity of the application pool (memorize the first few characters so it is easy to find), we can visit IIS Administration and verify that the Application Pool is indeed stopped. Screen Shot - 3 7) To change the application pools credentials, select the application pool and click on "Advanced Settings". This will bring up a dialog box with application pool properties. Screen Shot - 4 8) Under the Process Model section, click on the [...] icon to change the Identity value. Enter valid credentials for the application pool user. The application pool user must have read-write access to the SharePoint term store, specifically the Data Facet term store group. After that, DataFacet should work again.

How do I fix "Error 503" when accessing the DataFacet taxonomy Service Application?

 
The "Error 503" refers to the fact that the DataFacet web service application is not available for some reason.  The most common reason is that the application pool associated with the web service application cannot be started because of invalid credentials.   If you installed DataFacet under a specific account and then later changed the password or privileges of that account, then the DataFacet application service will not start because it's application pool will not start.
1) Make note of the time.  If you can afford the luxury, you may want to clear out the system and application event logs to make it easier to spot error messages - but it is not necessary at all.
2) Run IISRESET to try to restart all of the SharePoint application services.
3) After IISRESET has run, make sure that SharePoint Central Administration and the Managed Metadata service are both up and running by visiting them in the browser.
4) Click on the DataFacet service application link.   If the page shows, you probably just needed a remedial IISRESET and you are good to go.
5)  If you get the 503 error again, go to the event log and look for a succession of two warnings and an error from the WAS Source:

It is recommended to have separate columns for each term set instead of clumping them all into one. - If a document is checked out by the user and the annotator is run, what will happen? Will the annotator still index and tag?*** YES

Can DataFacet be installed directly on a live (production) server?

Yes - but please don't. DataFacet will require at least one IIS reset before going live. Plan on about an hour of down-time to upgrade to a new version of DataFacet on a production server.

Does DataFacet require Managed Metadata service application?

Yes. DataFacet will not operate without a working Managed Metadata Service. You will need to configure an Enterprise Keywords service application.

Does the DataFacet support term change auditing?

For example:  a term that has been edited or a rule added; is this audited and accessible from the Central Admin Server App user interface? We do have logging, but it has not been designed specifically for auditing purposes. Better logging and auditing support is planned for a future release.

How does Datafacet handle multiple SharePoint instances?

For example: DEV, PROD, and TEST and the subsequent synchronization of terms across the various environments? You can export DataFacet taxonomies to the industry-standard SKOS format and import them into another SharePoint farm. We have a user story for a more automated process, but for now, this approach can be automated with PowerShell.

If I change a term in one language, will it affect the same term in a different language in another taxonomy?

Currently changes to one translation of a taxonomy will not change the translation of the term in another taxonomy as they are separate objects.

We have SharePoint 2007. What Search Engines are supported?

DataFacet for SharePoint 2010 takes full advantage of the new Managed Metadata feature, which is not available in SharePoint 2007. Because of this, the architecture is fundamentally different, and DataFacet for SharePoint 2007 does not support tagging of documents stored in SharePoint Document Libraries. We do support an interim solution for customers that are running SharePoint 2007 but are on an upgrade path to SharePoint 2010. DataFacet Foundation with RebelSearch is our own search engine based on the popular Lucene search libraries. Advantages include blazing fast performance, support for deep facet navigation, excellent integration with SharePoint and good integration with SharePoint 2007 and low cost relative to other options. In addition, we support several other Enterprise Search applications if you happen to have them already, including Coveo, Exalead, Solr and MarkLogic. There is some ramp-up time associated with all of these solutions. These can be sensible alternatives for an organization that has already standardized on a third-party enterprise search engine.

What permissions are required to install and run DataFacet?

In order to successfully install DataFacet you will need to run it under account that has:
  1. Local Admin Rights. We are copying files on the local file system & adding assemblies into the GAC.
  2. SharePoint Farm Admin rights (i.e. account with write permissions to Config DB). We are installing sharepoint features, registering services, provisioning service application and more.
  3. Term Store admin rights. We are adding term group called "Data Facet" at installation time.

What SharePoint platforms does DataFacet support?

There are two different versions of DataFacet for SharePoint: DataFacet for SharePoint 2010 Server and Enterprise, and DataFacet for SharePoint 2010 Foundation Server and SharePoint 2007. DataFacet for SharePoint 2010 is fully integrated with SharePoint and requires the Enterprise Keywords and either SharePoint Enterprise Search or FAST for SharePoint 2010 to be installed. FAST for SharePoint requires SharePoint 2010 Enterprise edition. Another options for search engine is DataFacet RebelSearch, which provides a high-performance search engine with complete deep facet navigation that is native to SharePoint, yet has a small footprint and is easy to configure.

What version of .Net required for AR Document Annotator for Windows?

Currently, all Windows-based AR and Datafacet products require at least Windows .Net framework version 3.5 Service Pack 1. DataFacet for SharePoint has the same minimum requirements as SharePoint.

Where do the rules get stored in the SharePoint?

The rules are stored as properties on the term store objects. Ultimately they are stored in the SharePoint database, but this is an opaque data store.

Where is the best place to install DataFacet in a SharePoint farm environment?

Additional Information: For example, we have a three farm 2010 configuration where there is a document collaboration farm, a social farm, and then a shared application service farm. Where would DataFacet need to be installed in this case? Answer: When you install DataFacet on a Farm, you need to run the installer on every machine on the farm. The installer will automatically install the appropriate features for the topology of the server. So, if a server is a WFE without shared application features for the topology of the server. So, if a server is a WFE without shared application services, only the DataFacet UI components will be installed (templates, features). The DataFacet taxonomy service will be installed with other shared application services.

Are there plans for a DataFacet user group of user forum?

Not yet. We are looking into options for this.

Does DataFacet support SharePoint 2010 Foundation?

SharePoint Foundation does not have a term store. You must be using SharePoint 2010 Standard or Enterprise

How much time would you estimate an "Taxonomy administrator" would have to apply themselves to the governance of the taxonomies?

Depending on the size of the company, anywhere from 4-8 hours a week. For larger companies this is often a full-time job and can even have multiple people engaged in managing the taxonomies.

What business role usually should be in charge of the taxonomy administration?

This role is usually overseen by content administrators or similar business roles. IT roles and SharePoint architects are not necessarily the best choices for this task. Someone who is in charge of Master Data Management can typically transfer that knowledge to creating and managing taxonomies. In addition, company librarians can do an excellent job managing taxonomies; however, they are typically only present in large corporate environments.

Where do I see a sample of your taxonomies?

Our general Business taxonomy is available as a free download and is provided in CSV format. Please visit http://www.datafacet.com/ for details. The XML version (in encrypted SKOS format) of the DataFacet General Business taxonomy is shipped with DataFacet. You can find it in the 14 hive, commonly:

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\WebServices\TaxonomyService\StaticData\DataFacet_GeneralBusiness_SKOS.tax"

Are OWL formatted taxonomies compatible with DataFacet?

OWL is not supported natively, but it is very easy to convert from OWL to SKOS. We do support SKOS natively. We can help you with that. Just ask.

Can you import taxonomies in SharePoint Online (Office 365)?

Yes, taxonomies and term store operates the same way in SharePoint Online as it does in SharePoint 2010. DataFacet Automatic Tagging Engine is currently not compatible with SharePoint Online however.

Does DataFacet support SharePoint 2010 Foundation?

SharePoint Foundation does not have a term store. You must be using SharePoint 2010 Standard or Enterprise

How can I import taxonomies in XML into SharePoint 2010?

SharePoint 2010 itself does not support importing taxonomies in XML format. It only supports a specific variant of CSV. However, if you have taxonomies in XML which you wish to import into SharePoint, we can assist with transforming the data into the appropriate format on a services basis. See http://technet.microsoft.com/en-us/library/ee424396.aspx for the SharePoint CSV import format. DataFacet supports two XML schemas for import: DataFacet also supports a limited subset of RDF/XML documents with a constantly evolving fidelity.

How does DataFacet treat ASPX documents?

ASPX documents are the same as HTML Documents. DataFacet does not process them any different from any other content type - they are handled by the protocol handlers and iFilters in SharePoint.

Is there a way to manage special characters (é à ...) during the import of taxonomies into the term store?

Since the Term Store only imports .CSV files, and special characters—even though they can be entered and used in the .CSV file—do not import correctly into the Term Store. Both UTF-7 and UTF-8 code produces the same result: special characters do not import correctly into the Term Store. For special characters have had to adjust/edit in the Term Store as necessary. One work around is to identify terms with special characters in advance within the .CSV, notice any duplications, and copy or reuse those terms once they’ve been adjusted within the Term Store. This way, the term with the special character only has to be edited once and then copied where necessary, rather than importing the same term many times and having to edit them all within the Term Store. Not an elegant solution, for sure, but one that is a shortcoming of the Term Store. Also DataFacet for SharePoint does handle UTF8 encoded XML files, so there is no problem importing taxonomy files in any of the XML formats that we support (e.g., SKOS and ARTX).

Many categories, like job roles, are already maintained in business systems. How would we go about integrating these systems into the term store?

This would require either custom development through an API or a static import based on an export from your current system.

Can DataFacet be configured to update content types for an object?

Not yet. It is an active field of research. We do, however, support SharePoint Content Organizer and multiple content types. With Content Organizer, you can set up routing rules based on tags added by the DataFacet annotator. You can also assign specific taxonomy columns to specific content types to limit which content types are annotated with specific taxonomies.

Can DataFacet tag content that is not in a document library, for example; calendars, tasks, issues, wikis?

Yes, it can. Same option is available on the List Setting page. However, one would need to have field rules (since tasks have no stream content) like "title:blah" to get it working for non-documents. (If it is not working right now - it is a bug). The name of a file is metadata by definition, so by default all documents will have that metadata stored in the path field when it is imported. The next logical step is to classify a document according to a file name.In fact, SharePoint has that feature built-in. http://blogs.technet.com/b/speschka/archive/2009/10/30/sharepoint-2010-content-organizer-part-1-a-cool-new-feature-for-managing-your-content.aspx http://msdn.microsoft.com/en-us/library/ee558288.aspx DataFacet can work with Content Organizer to augment the classification with auto-tagging based on document content. Both features work together to provide the ultimate classification and tagging combo.

Can the terms in DataFacet term store be applied to documents outside of SharePoint?

It depends. We have the ability to connect to different types of search and content management systems but we need to understand how the medical library is stored/retrieved. For the most part, if a system can read a document and exposes that text via a pipeline, (SharePoint/Oracle/etc.), then the process is "I am checking in this document, and if you want to read/annotate, use this API". We can evaluate that API against the market to see if we want to spend time and resources building an adapter for that pipeline. But, If they are just documents sitting on a file-share or web-server without some sort of centralized index them then we currently do not tag them because the tags would have to be inserted into the document to be stored somewhere and that is something that we do not want to do.

Can there be a choice to not remove tags?

I don't believe we have tested this precise case. We do honor ACL permissions on the term store - but I think this is a more granular level.

Can users over ride the automatic tagging?

Users are able to make changes to the tags that are applied by the automatic tagging engine. A user is able to delete tags or to add additional tags of their own. However, if you trigger a document set to be retagged based upon new rules or new taxonomies, the document will be re-tagged based upon the tagging rules and any manual changes made by the user will be over-ridden In future versions of DataFacet, there will be options for retaining changes made by end –users even in cases of re-tagging.

Can you modify, add or delete tags that are added automatically by the Annotator?

DataFacet will overwrite all taxonomy metadata every time a document triggers an "update" event. You can manually modify the terms, but any changes to the document or metadata will only persist until the next time the document is updated.

Does DataFacet modify original document files in any way?

No. All taxonomy information is added to managed metadata site columns in SharePoint. Taxonomy annotations are not propagated back to the source document properties.

Does DataFacet require administrative rights to install?

Yes. DataFacet is a SharePoint Application Service. It requires farm administrator rights to install and run.

Does DataFacet require an internet connection to operate?

No. DataFacet only requires access to local resources and resources that will be indexed through the FAST or SharePoint crawlers and connectors.

Does Datafacet support SharePoint publishing pages?

We prefer FAST for a variety of reasons, not just the crawling capabilities. Primary for us on the technical side is the ability to have use the "deep facet" navigation that is available in SharePoint search but is not available in standard SharePoint search. FAST is a much more scalable search engine that the SharePoint native search engine, but the trade-of is an increased in complexity. FAST is quite resource hungry. However, the trade-off in search results accuracy is substantial, especially for our taxonomy navigation facets. Another feature is the more extensible pipeline. SharePoint search crawlers are not extensible like FAST crawlers are. So, there is no interface for us to link into to use BCD. We support SharePoint search only for web sites, file systems, and native SharePoint data sources.

How do you control which content is tagged with which taxonomy information?

How do you turn tagging on for all document libraries in a site? a) There is no way to turn it on on a specific site only. You could turn it on for a specific document library however. How do you turn it on for all document libraries in a site collection? Activate or Deactivate the Site Collection Feature: "DataFacet Annotator toggle for all Document Libraries in this Site Collection" When you activate this feature, DataFacet will enable the Annotator for all document libraries in this site collection. You can then manually disable the Annotator on individual Document Libraries through Library Settings. When you deactivate this feature, DataFacet will disable the Annotator for all document libraries in this Site Collection. Requires the feature “DataFacet Annotator Enabled on this Site Collection to be activated.

How does one re-index and re-tag documents already in SP2010 after changes to the taxonomy have been made?

We have power shell scripts to auto re-index documents if major changes have been made.

How is a tag deleted from a document that DataFacet applies to a document?

We do In the current behavior, there is no way to permanently delete a taxonomy tag from a document. You can temporarily remove a tag at check-in time, but any subsequent update of the document will re-tag the document from the taxonomy. A way to permanently exclude a tag from a document is to add an exclusion rule for every document that should not have the tag to the term rule in the taxonomy itself. You will need to know a unique identifier for the document (e.g, URL or Path) and add an exclusion rule. For example, if the preferred term is "Gemstone" and the rule is (gemstone OR gem OR "precious stone" OR "semiprecious stone") and you want to exclude the document with a path \rockstore\minerals\organic\coal\anthracite.doc, you can modify the query to look like this: (gemstone OR gem OR "precious stone" OR "semiprecious stone") -PATH:"\rockstore\minerals\organic\coal\anthracite.doc"

What is the installation sequence for installing on a farm?

1. Copy the datafacet installer exe file to each WFE and the main Application Server on the farm.
2. RDP to the server where you want the DataFacet service application running
3. double-click .exe
4. Select the Farm installation parameters
            a. Checked "Provision service application"
            b. Checked "Install service application"
            c. Select the Application Server that will host DataFacet from the Target Server      dropdown
5. For each additional server in the farm
    a. RDP to the server
    b. Double-click on the .exe
    c. Uncheck the "Provision service application" and "Install service instance" checkboxes

This simply copies code the is  required by the feature, but doesn't deploy anything to the WFE.

What permissions are required for the DataFacet Service Application user account?

This account must have explicit read and write permissions to Data Facet termstore group. And that is it. Although, we are going to change this behavior in future.

What set of documents does the "Test" button run against?

 DataFacet maintains it's own (optional) index of documents that have been checked-in to SharePoint.  When a document passes through the auto-tagger, it gets added to this index.   The number of documents are fairly inconsequential since searching this index has no discernible effect on performance of SharePoint.   It is a very efficient Lucene index, totally separate from the SharePoint full-text index (either FAST or SharePoint Search).  It can support a million documents or so before having to think about advanced scaling configurations.
Note that this index, by default, can take up significant space because document text is stored for quick
preview purposes.   The index size can be reduced significantly through a variety of techniques, if size becomes an issue.

When are new rules (rules changes) applied to the terms updated?

There are two distinct events.One is On Document Update, the other is On Term Store Update. For the On Document Update event, we add a delegate to the built-in SharePoint API et voila, all documents checked into the library are annotated. The on Term Store Update event is a different story. Since each document must be read in order to annotate it, it is an I/O intensive process, so it is inherently best done in batch mode in the background. Currently, we have a PowerShell script that can be scheduled to re-annotate documents in a given library.

Where is the taxonomy information stored for documents?

The terms get written to the index of the document that is stored in SharePoint but it does not change the contents of the document itself. The original document is never updated, only the SharePoint metadata.

Where tags are assigned to Records, does Datafacet ensure that updates are effectively recorded in the Audit event?

DataFacet integrates well with records management systems out of the box by virtue of the support for the SharePoint Managed Keywords feature. We do provide audit logs through the normal SharePoint logging system, but there is no specific integration with records management at this time.

How does the DataFacet automatic tagging affect the performance of SharePoint?

DataFacet is designed to run on existing SharePoint hardware and has a minimal impact on SharePoint performance. Ongoing day-to-day document check-in should not cause any noticeable change to SharePoint performance. If you plan to tag a large number of documents at a time, it is advisable to do this over a weekend or during other off-peak hours.

How does the DataFacet Automatic Tagging Engine deal with large site collections with thousands of documents?

DataFacet can tag documents one at a time when checked into SharePoint. DataFacet can also tag multiple documents checked in or can tag entire collections of thousands of documents. DataFacet is fast and scalable even on commodity hardware for SharePoint 2010.

Is 2 WFEs with 2 CPUs each and 16GB of RAM sufficient to run DataFacet?

In general terms, DataFacet is much lighter-weight than SharePoint. Chances are if you meet the minimum requirements to run SharePoint, you automatically have the minimum requirements to run DataFacet. The biggest variable is disk space for the intermediate Lucene index. If you have a lot of documents, you will need a fairly large disk drive to handle the full text index. Remember, however, that document text is usually much smaller than it's container. The actual text in a 1MB PDF file is often just a few KB. So if your documents contain a lot of formatting overhead (PDF, Graphics, Multimedia), then the ratio of stored text to source document size can be quite small.

Is there any general rule of thumb for the size of the database that will be created based on how much content?

DataFacet does not use a SQL database, so there is no reason to allocate storage on a database server. We do maintain a local full-text index that is used for testing. RebelSearch uses a similar index as well, but both are self-contained data stores. As a rule of thumb, the internal data storage will depend heavily on the character of the incoming documents. It will be some percentage of the documents in the repository depending on the ratio of non-text formatting to actual text. We only store the text part of a document, so markup and images are completely discarded. There is no way to pre-calculate the percentage - but an estimate can be made based on the length and type of documents. PDF Images from scanned sources will have a very low text/format ratio. A 100MB file could easily have only 1k of text. Word Documents are often mostly text - so they would have a fairly high text/format ratio. A 100KB Word file might have 80KB text, if there are no images in the document. You can extrapolate to other document formats. A safe rule-of-thumb would be to have 100% of the document source size available on indexes. So, if you have 100GB of data, you should have at least 100GB of local storage for the index.

What can and what should represent as solid performance numbers?

RebelSearch crawling of about 1500 documents on commodity hardware running under a virtual machine. The mix of documents was from an internal Renaissance IT data set in English and Russian, with a fairly good mix of content types. This is a total throughput number, meaning that it includes crawling, filtering , tagging and indexing into Lucene. Alexey estimates that AR Classifier contributes about 50% of the latency in the process, meaning that without the classifier stage, the indexer is about twice as fast. These are legitimate numbers, but they are not rigorous numbers. Also, they do not apply to SharePoint Search or FAST search, which will have completely different metrics that we haven't tested yet. We can probably get real world numbers from IMF for indexing about 30,000 mostly PDF and Office documents on FAST, but again the numbers will include several other items including SeeUnity connectors, filtering and indexing in addition to tagging. Theoretically, we are scalable to much better numbers simply by throwing hardware or cloud infrastructure at the problem. We also have an Azure cloud implementation fo the classifier that could be re-animated in a week or so if we got a really big sale, or needed to meet some kind of performance criteria. In my experience with Verity and Autonomy, it is a bad idea to quote any kind of performance numbers that have not been empirically tested with real customer date on customer equipment. There is such a wide variety of document types, taxonomy sizes and hardware configurations, that it is impossible to state an actual number with any certainty, since we have not done rigorous performance testing in the SharePoint environment. DataFacet for SharePoint is slower than RebelSearch, but by how much depends on many factors. If we want to get real numbers, it must be a project with resources allocated and a compelling business reason to allocate those resources. Performance metrics can be very difficult to model outside of a real-world implementation. In my opinion we should be very circumspect about throwing numbers like "40 documents per second" or even our 200,000 rules per second metric. We have proven both in certain contexts - but things could go very wrong if we start committing to any particular number. A better approach is to ask the customer what their performance requirements are and ensure that we side the implementation accordingly, building in time for benchmarking if necessary. What we can say is that we are damn fast, and that we are damn scalable, and we're up to meeting any performance metric challenge that we get.

What is the likely "overhead" the annotator has on the SharePoint infrastructure specifically, search and DB size?

Very little impact. Our annotator is very fast, with the ability to process 200,000 queries per second on commodity hardware. In a general sense, the annotator does not contribute any perceptible overhead to the ingestion process, except possibly for the first document which populates the caches.

What is the size of the DataFacet Lucene index that gets created?

That is somewhat configurable. By default, we create a fairly large index by storing all document text in the Lucene index file. The reason we do this is to allow preview of results documents directly from the stored text in the index. This is configurable, however. So, if you find your Lucene index is getting too big, we can configure it to store less information and show truncated document previews in teh taxonomy manager.

Are re-used terms across groups updated synchronously?

Yes, Re-used terms are updated across term groups. However, it should be noted that adding a narrower term to one does not automatically add the same narrower term to another. Only the term itself is synchronized. Also, be careful not to confuse copying a term with reusing a term.

Are the taxonomies available in multiple languages?

Yes, taxonomies are available in multiple languages, including German, Spanish, French, Italian, Portuguese, Traditional Chinese, Simplified Chinese, Japanese, Korean, and Vietnamese. Limited parts of the taxonomies are translated into Swedish and Finnish.  Taxonomies can be translated into any languages quickly (using human translators).

Can DataFacet be used for field mapping?

Using DataFacet, can files that are named based on the type of content in each can these documents be imported to SharePoint so that they are applied as metadata to each document, knowing that there is no taxonomy beyond the file name? The name of a file is metadata by definition, so by default all documents will have that metadata stored in the path field when it is imported.The next logical step is to classify a document according to a file name. In fact, SharePoint has that feature built-in. http://blogs.technet.com/b/speschka/archive/2009/10/30/sharepoint-2010-content-organizer-part-1-a-cool-new-feature-for-managing-your-content.aspx http://msdn.microsoft.com/en-us/library/ee558288.aspx DataFacet can work with Content Organizer to augment the classification with auto-tagging based on document content. Both features work together to provide the ultimate classification and tagging combo.

Do DataFacet rules handle proximity operators (near queries)?

EXAMPLE: What if one of their medical clients want to tag head and neck neoplasm’s. They want head and neck to be in the same sentence and neoplasm to be in the title. At the very least, can the tag be designated as title specific? Proximity searches are available to some extent (not as much 'same sentence' as within x characters). Both proximity ("head neoplasm"~10 OR "neck neoplasm"~10) and Field (title: "head neoplasm"~10) are supported.

Do DataFacet's customers combined the term tagging of DataFacet with natural language capabilities?

Currently no,

Does DataFacet have the option to copy a term or reuse it?

We do not model the "Reuse" option, because that is essentially a "reference" term that we do not currently support in our UI. Each term is unique, even if it shares a name with another term. For examples; /Animals/Bears is completely different from /Football Teams/Bears. Even though the name "Bears" is the same, they are completely different objects.

How can I handle "Folksonomy" with DataFacet?

There are two different ways to add a Folksonomy. Folksonomy terms are added to the documents searchable text but will not add nodes to the structured taxonomy hierarchy or document display. Process 1: Unstructured Folksonomy Term List (Uncurated,user-generated, unstructured keywords) Setup Process: From your sharepoint instance, select the library you wish to add the un-curated column.
  • Select library tools->library->Library settings
  • Down in the columns section, click 'Create column'.
  • Type in column name, for this example you can just use 'Folksonomy Keywords'
  • Choose the type to be Multiple Lines of Text · Click 'OK'
  • Check-in Process / Edit document properties During document check-in there will now be a column for Folksonomy Keywords.
  • The user can now add additional lines of text for the document that can be searched after the document is indexed.
  • Hit 'Ok' to add the meta-tag to the document and save the document.
(note: The keywords may not be immediately available for search until the indexer has indexed the document.Check your search application settings to see how often your index is updated). Process 2: Structured Folksonomy (Uncurated, user-generated taxonomy with structured nodes) Rather than just typing in keywords, you can leveraing managed meta-tags and physically creating new nodes into an un-curated managed metatags column. Setup Process:
  • From your sharepoint instance, select the library you wish to add the un-curated column.
  • Select library tools->library->Library settings
  • Down in the columns section, click 'Create column'.
  • Type in column name, for this example you can just use 'Folksonomy'
  • Choose the type to be Managed Metadata · Check 'Allow multiple values'
  • Select the radio button for 'Customize your term set:'
  • Make sure the 'Yes' option for 'Allow 'Fill-in' choices:' is selected.
  • Click 'OK'
(note: The nodes may not be immediately available for search until the indexer has indexed the document. Check your search application settings to see how often your index is updated).

Check-in Process / Edit Document Properties

During document check-in there will now be a column for Folksonomy.
  • User can click the double-tag icon (on the right hand side of the box) which will load up the metatagscreen.
  • Click the 'Add New Item' in the upper right.
  • Type in the name of the new node. Select/Highlight the new node, click the 'Select >>' button in the lower left to add it to the list.
  • Hit 'Ok' to add the meta-tag to the document and save the document.

How do changes to the translations impact the taxonomy if multiple languages are licensed and loaded into SP2010?

Currently changes to one translation of a taxonomy will not change the translation of the term in another taxonomy as they are separate objects.

How do proximity queries work?

A user may wish to use the proximity operators. Is there something that is changed about the query language that hides the proximity ~n operator? Example I search for "Motion during the 2011 year" - 1 result but… "Motion 2011 year"~8 - 0 results. Is the proximity operator working as intended or is something re-writing it?

How do the proximity operators work in DataFacet rules?

DataFacet supports finding words are a within a specific distance away. To do a proximity search use the tilde, “~”, symbol at the end of a Phrase. For example to search for a “bean” and “coffee” within 10 words of each other in a document use the search:
 “coffee bean”~10

How can I tell if the DataFacet Service Application user account is valid?

If not, application pool will fail to start and there will be error log entry in Event Log. Something like: ...this pool failed to start because login credentials are invalid...

How do I fix "Error 503" when accessing the DataFacet taxonomy Service Application?

The error you saw is not entirely uncommon. We're working on a troubleshooting guide to address it, but in the mean time - here are some steps you can take on your own to troubleshoot. The "Error 503" refers to the fact that the DataFacet web service application is not available for some reason. The most common reason is that the application pool associated with the web service application cannot be started because of invalid credentials. If you installed DataFacet under a specific account and then later changed the password or privileges of that account, then the DataFacet application service will not start because it's application pool will not start. 1) Make note of the time. If you can afford the luxury, you may want to clear out the system and application event logs to make it easier to spot error messages - but it is not necessary at all. 2) Run IISRESET to try to restart all of the SharePoint application services. 3) After IISRESET has run, make sure that SharePoint Central Administration and the Managed Metadata service are both up and running by visiting them in the browser. 4) Click on the DataFacet service application link. If the page shows, you probably just needed a remedial IISRESET and you are good to go. 5) If you get the 503 error again, go to the event log and look for a succession of two warnings and an error from the WAS Source: Screen Shot The error will be something like "Application pool xxxxxxxxxxxxxxxxxxxxxxxx has been disabled. Windows Process Activation Service (WAS) encountered a failure when it started a worker process to serve the application pool" The two warnings will hint at why the application pool failed to start - probably having to do with authentication. Screen Shot - 2 6) Now that you know the identity of the application pool (memorize the first few characters so it is easy to find), we can visit IIS Administration and verify that the Application Pool is indeed stopped. Screen Shot - 3 7) To change the application pools credentials, select the application pool and click on "Advanced Settings". This will bring up a dialog box with application pool properties. Screen Shot - 4 Under the Process Model section, click on the [...] icon to change the Identity value. Enter valid credentials for the application pool user. The application pool user must have read-write access to the SharePoint term store, specifically the Data Facet term store group. After that, DataFacet should work again.

I am getting an error similar to "DataFacet Taxonomy Tree Refinement Panel. One of the properties of the Web Part has an incorrect format. Microsoft SharePoint Foundation cannot deserialize the Web Part. How can i fix it? I was told "Check the format of the properties and try again."

This issue sometimes appear after product upgrade. Basically, .webpart files stored in WebParts Gallery get outdated. In order to upgrade them you will need to:
  1. Site Actions -> Site Settings
  2. Click on "Web parts" under "Galleries"
  3. Find & delete following files CurrentRefinementConnectionProvider.webpart CurrentRefinementsWebPart.webpart TaxonomyKeywordListRefinementWebPart.webpart TaxonomyTreeRefinementWebPart.webpart
  4. Return back to Site Collection Settings page
  5. Click on "Site collection features" under "Site Collection Administration"
  6. Reactivate (deactivate & active) the feature called "DataFacet Annotator enabled on this Site Collection"
That should force new webpart files to be uploaded into the WebPart gallery & resolve the deserialization issue.

My Rules don't seem to be matching properly. I think I am getting more hits than I should.

Check to make sure your operator is all uppercase. Otherwise it will be treated as a stop word.

PDF Files are not being annotated, even though I have Acrobat installed on the server.

The Adobe web site seems to suggest that the stand-alone iFilter is not required if you install the latest Acrobat Reader. This is true for the desktop search, but for server products like SharePoint, you still must install the Adobe iFilter for 64bit platforms. http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4025&fileID=3941

What do I do when I get the error message "Error retrieving list of taxonomies: Term group Data Facet does not exist”?

"To restore the DataFacet Term group:" 1. Open your term store management page. (Central Administration -> Manage Service Applications -> Managed Metadata Service)
  1. Expand Managed Metadata Service context menu, click New Group
  2. Enter """"Data Facet"""" without double quotes & hit Enter
  3. Ensure that Group Managers has at least this accounts: account used to run Central Admin app pool, account used to run DataFacet taxonomy service app pool, your account.
  4. run iisreset
  5. Check the """"manage taxonomies"""" page to ensure that you could see an empty list of taxonomies now without any errors """

What effect will changing the SharePoint Administrator password have on DataFacet?

It depends. First if DataFacet uses a different account to run service application - no effect. Second, if password was updated through SharePoint interface - no issues too. The only problem could be if password changed outside of SharePoint (and IIS), then IIS's Application Pool will fail start.

If a document is checked out by the user and the annotator is run, what will happen?

The annotator will still index and tag.

Can term groups and/or sets be used to create content types that can be user across an organization?

Term groups and sets cannot be used as content types. However, SharePoint 2010 does enable organizations to share content types across your organization using the “Content Types Hub”. Essentially, a single site collection is designated as a Content Type Hubs and then content types which are part of that hub can be syndicated to other site collections.

Can users select more than one tag, if yes can they be from different term sets?

The answer to both of these questions is yes. You can set up a managed metadata column to allow users to select single values or multiple values. This is an option that is selectable when the column is created. Also when the column is created, you are able to define which term sets the column will be populated with. You can select a single term set or multiple term sets.

Can you create multi-language taxonomies in an excel file to be imported in SharePoint?

The SharePoint 2010 import format does not support importing multiple languages nor does it support importing synonyms (which are called Labels). Unfortunately, translations and synonyms must be manually entered in the term store

Could limiting the number of datafacet admins possibly become short term full-time work as it is ramped up for use across the enterprise?

It's probably more likely to be longer part-time work, since it does take time to build out and improve the taxonomies, due to scheduling people's time to review/upgrade suggested terms, adding new term groups and term sets, etc. This is usually why companies don't hire fte taxonomists: there's usually not a constant demand. It usually goes in stops and starts.

Do you offer professional services in Europe?

Yes, we have worked in London and Switzerland on taxonomy projects for company portals and intranets, and can be available to do the same for other companies in Europe.

How accurate is AutoClassification?

Since our engine is fully deterministic (I think Jeff Fried uses the term "Habitable") then we have 100% accuracy.  The rules either match or they don't.

How are Enterprise keywords applied? How do users enter a keyword?

Enterprise keywords are keywords which are not part of a taxonomy term set and which have been entered by users. Users can enter these tags into a “Tags and Notes” field or into a managed metadata field if they don’t find an existing term that meets their needs. These keywords should be monitored to capture terminology that end users are using. Once these keywords have been entered, they can be left as enterprise keywords or they can be promoted to a managed metadata term set. It should also be noted that there is a concept of Enterprise Keywords column in SharePoint 2010. This is a predefined column that can be applied to content types. Potential values for the Enterprise Keywords column are all managed metadata values as well as any managed keywords that users have previously entered.

How do you manage non preferred terms?

Non-Preferred terms, or synonyms, are added to SharePoint 2010 as “Labels”. These can be manually entered in the term store. Users who search for these synonyms will be directed to the preferred term

How does DataFacet treat search "Scopes" and "Security Trimming"?

"Scopes" is a feature of the SharePoint search engine and we fully respect them at search results time.
"Scopes" is a way of restricting the domain over which a user query is executed.   Since "Scopes" are applied as a kind of "filter query", the search results have already been restricted by the time they reach our navigation control.  This is the same way we interact with "Security Trimming" in SharePoint.
Note that we do not support "Scopes" in the DataFacet taxonomy editor test search, only the end user SharePoint or FAST search.  You can control subsets of documents in the taxonomy editor through DataFacet configuration and we have a more friendly interface scheduled in the product road map (it has not been assigned to a specific release, however).

How many rules need to be customized after the Taxonomy in the MetaStore is in place and ready for consumption?

It is worth looking at all of the rules to make sure things are tagging the way you like, but if you have a limited amount of time to take a look at the rules, then you should prioritize: 1) Rules for terms that need to be tagged with a higher degree of accuracy 2) Higher level terms which may match to too many documents and need to be focused and narrowed down 3) Lower level terms which may match very well just based on the keyword name but it is worthwhile to make sure that synonyms are included in the rule.

How would you describe the difference between AutoClassifying at index time, and FAST’s Entity Extraction?

FAST Entity Extraction rules are somewhat limited.   They only work for exact matches of dictionary terms.   AutoClassifier supports a full set of rules, including proximity, stemming, case variants, wildcards and regular expressions.   FAST entity extraction does not have the friendly taxonomy editor interface.
They are not mutually exclusive however.  FAST entity extraction can be used with AutoClassifier if required.
Also, FAST entity extraction has no relation to the term store.  So you have to manage those terms independently of SharePoint.    It's not as convenient as doing the same thing with AutoClassifier.

If a rule can be changed, can it be done to just update that rule.

 
When re-tagging, the bottleneck is more the document check-in/check-out, versus applying the rules. Retagging cannot be done just for individual rules.

If users drag-and-drop documents from a file share into Windows Explorer integration with SharePoint, will they be tagged?

Yes.  We fully support any method of adding documents to SharePoint, whether it is checked-in through the web interface or dragged-and-dropped through the WebDAV connector to Windows Explorer.  We also support direct check-in through Microsoft Office integration and will soon support Colligo check-in.

If we can update the software so they can export into SKOS.

 Mark will have to check in with George on this to see how easy it would be to update the code. They would interest in some guidelines in performance testing. There are concerns about tagging all 3.2 MM of their docs at once and how long that would take. They would like to get the powershell scripts to automate the triggering of tagging**

Is document tagging applicable if Check in/Check out is disabled?

If content is migrated from 2007 and it has existing content types and site columns. DataFacet will initially be turned off. These values will be held in a “legacy column” and will not be in the term store. But, the term sets may include these values.

Is it a good practice to let Site Stewards have free rein on creating Local Custom Terms?

If the local custom term sets are contained to the site collection only, this shouldn't be a major problem. The main taxonomist should review these localized term sets at least once a month to determine if any should be promoted to preferred terms or term sets for the entire organization. Any custom terms that are approved must be moved to the main term set and then re-used back to the set which it came from. We need to confirm whether or not a moved term from the custom term set or an enterprise keyword keeps the same GUID or not. **

Is it be preferable to establish a DataFacet specific column for all the libraries then push it out locally?

It is recommended to have separate columns for each term set instead of clumping them all into one.

Should the number of admins of DataFacet be kept to a minimum (2-3 or 1) rather than a larger number (10-12)?

Limiting the number of datafacet admins to fewer than sharepoint term store admins to reduce the chance for errors. 2-3 is good since it eliminates the chance of an admin not being around to make changes (e.g. on vacation, no longer with the company, etc). There will probably be more sharepoint term store admins because these people can and should be making changes at the more local level, but don't need to be directly responsible for the data facet solution.

Should the taxonomy be created to cater to users? If so, how do you relate the user’s taxonomies to the company's file plan for retention and disposition?

Taxonomies should always relate to the users because users are the ones searching for information. The companies file plan can be used as a starting point to set up the initial taxonomies because users are already accustomed to organizing information this way.

What is the best way to represent a "Region, District, Area Zone" type hierarchy in SP2010. Is this a good, bad, so-so thing to do?

If “Region,” “District,” “Area,” or “Zone” can be used across multiple business areas then they should be separate terms sets. Otherwise build one term sets for “Regions” and put the other sets beneath it.

When a rule change is made, can that change communicate to the site steward’s to manually trigger a retag of libraries to take advantage of the new rules?

Yes, this would be a valid approach.

When DataFacet is run after Check in/Check out is disabled, what will happen?

DataFacet will simply tag as normal and it does not matter if DataFacet tags a term that is already in the legacy metadata.

When is it best to clone taxonomy so that it can be applied to another area?

Once a term set is relatively ”stable” then it can be safely copied from one term group to another. No new major additions can be made because newly added terms are not replicated in copied term sets.

Can taxonomies be edited from a browser – i.e. not via the Central Admin Service Apps?

Yes. We have a Stand-Alone HTML5 application that can be hosted in Adobe Air or on any modernbrowser that supports HTML 5. It has the same features as the Central Admin Version, and is basedon the same JavaScript code base.