Monday, October 01, 2007

How to index PDFs in MOSS 2007 document libraries

SharePoint document libraries are phenomenal tools for collaborative environments where files are shared. And SharePoint's ability to search files in document libraries makes finding files easy. Well, unless the document is a non-Microsoft file type, such as the ever-present PDF file.


The sad fact of the matter is that Windows SharePoint Services (WSS) 3.0 and Microsoft Office SharePoint Server (MOSS) 2007 can't index PDFs by default. That's not news to many veteran SharePoint professionals. Nor is the fact that you can add an icon for PDFs, reindex existing documents, and so forth. However, many administrators are new to SharePoint, and will hit their heads hard against this problem. I was disappointed to see that, despite extensive searching on Google, I could find no single, authoritative, and (most importantly) complete guide for how to do so.

The "bottom line" is that you must install an iFilter for PDFs on your SharePoint servers--specifically, any server that performs search, which would be all WSS servers and your MOSS search server. iFilters are plug-ins that enable indexing of file types. Although iFilter is a Microsoft specification, it is generally through vendors or third parties that you'll get iFilters--not through Microsoft itself.

After you add the iFilter, you must configure SharePoint to index the file type (.PDF). But then, you still have two problems. The biggest is that SharePoint will index only files that are added or existing files whose properties change. So SharePoint will not index existing PDFs when you add the PDF iFilter. You must rebuild your index. The second challenge, purely a cosmetic one, is that you enable SharePoint to display an appropriate icon for PDFs.

This installment will focus on 32-bit WSS servers. Both of these documents contain the word "iFilter" in them, but a search produces only the Word document. Now, let's fix the problem!

1. You will need two downloads: The Adobe PDF iFilter version 6.0, available from Adobe click here
An icon for PDFs, also available from Adobe. Check their licensing page then download the gif


2. Install the iFilter. Note: Many guides on the Internet suggest shutting down Microsoft IIS or the Shared Service Provider (SSP) or the WSS application(s). I found this was not necessary, and Microsoft's own KB article 927675 did not specify it was necessary.


3. Add a registry entry for the .pdf extension in the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList. (Open the registry editor. Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\. Identify the highest "number" value in the key. On a default installation of WSS, the highest entry is 37. Note they are not sorted in numeric order because registry value names are strings. Create a registry value for the next number, e.g. 38, by choosing Edit à New à String Value then naming the value the next highest number (e.g. 38). Double-click the value you just created and, in the Value Data box, type: pdf. Note there is no dot preceding the extension.


4. There are two registry keys with specific values that must exist. Verify that these exist and, if not, create them: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf - Value Name: Default; Type: REG_MULTI_SZ; Data: {4C904448-74A9-11D0-AF6E-00C04FD8DC02})
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\Filters\.pdf (Value Name: Default; Type: REG_SZ; Data: (value not set) - Value Name: Extension; Type: REG_SZ; Data: pdf
- Value Name: FileTypeBucket; Type: REG_DWORD; Data: 0x00000001 (1)
- Value Name: MimeTypes; Type: REG_SZ; Data: application/pdf


5. Restart the Windows SharePoint Services Search service. Open a command prompt. Type net stop spsearch, then net start spsearch. Perform a search, and existing PDFs will not be returned. But newly added PDFs will (once indexed by SharePoint) appear in search results. If you modify any property of an existing PDF, it will be indexed. But who wants to modify all existing PDFs in a document library? This is where I found a lot of misinformation online. Even Microsoft's KB 927675 didn't suggest the right solution! It's easy! STSADM, SharePoint's ubercommand, to the rescue!


6. Rebuild the WSS search index.- Open a command prompt.- Navigate to Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN and type the following commands
stsadm.exe -o spsearch -action fullcrawlstop
stsadm.exe -o spsearch -action fullcrawlstart
The existing PDFs will, after being indexed, appear in search results. But they will still not have correct icons. So, while your site is being indexed, keep going with these steps to configure the icon.


7. Open the folder Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images.


8. Copy the gif you downloaded in Step 1 into the folder.


9. Open the folder Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml.


10. Right-click the file docicon.xml and choose Open With and select Notepad.


11. In the element, you'll see a number of elements. You will add one for pdf. It does not have to be in alphabetical order. The element you need to add is:


12. Save that file and close Notepad.Now, the moment of truth. A search now provides the results.

7 comments:

Anonymous said...

I've both of these but I am still finding that the actual content inside my pdf files are not being indexed. Anywhere I should start looking?

Toby - topedia.net said...

Hey, this is a great post but you forget step 13: At the end you must make an IISReset else the pdf icon is not activated.

Anonymous said...

A word to the wise on the icon piece of this (which I experienced). Depending on how you put the GIF icon into the \images directory (copy/paste vs. cut/paste), the GIF file may not inherit the appropriate permissions from the parent directory, thereby denying regular users read permissions to that file. We had a lot of people with PDF's in their document libraries getting repeated credentials prompts because IIS was trying to load that icon in the document list. Eventually they'd get into the library with a missing image placeholder, but it was very interruputive to users. To rectify this, just go into the advanced ACL settings, and check the box for inheritance of the parent object for the GIF file itself.

Anonymous said...

I find that on my MOSS 2007 server that there is nothing below 'Applications' key in the registry:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\

So I can't get this to work, any thoughts?

King Bayern Munich said...

Hello, friends. The first time to your blog, let me very interested for the class. Write great!
Metal Hanger
Wooden Hanger
plush animal toys
bird toys
buy dog toys
plush toys manufacturer
cartoon toys
bear plush toys
pet bird toys
pet cat toys
best cat toys
wholesale cat toys
Pet Toys
spring suppliers
Lamp Light Fittings
lightning arresters
pin insulators
ceranuc lampholde
fuse cutout

King Bayern Munich said...

Hello, friends. The first time to your blog, let me very interested for the class. Write great!
meniscus lenses
fused silica
diesel generator
mobile light tower
china diesel engine
diesel engine supplier
china diesel generator
Snow thrower manufacturer
china Diesel water pump
Gasoline water pump
Board Production Line
Plastic Crusher
plastic pipe machinery
Plastic extrusion machinery
China Plastic Crusher
plastic pipe machine
pet strap machine
pp strap machine
Modern bathroom cabinets
stone shower tray
Buy Shower tray
antique bathroom cabinets
shower tray
cheap wash basin
bathroom wash basin
pu sofa leather
pvc leather

King Bayern Munich said...

Hello, friends. The first time to your blog, let me very interested for the class. Write great!
China Torches
bamboo torches
Pot Packing Candle
Bamboo Torch
total bamboo torch
bamboo garden gazebo
wooden barrels
chinese tea bags
wholesale tea bags
blooming tea
jasmine flower tea
chinese white tea
white tea supplier
white tea exporter
spring Mattress
Spring Clips
Mattress protector
Nonwoven Fabric
Printed fabric
Mattress Fabric