|
You can search for any word or phrase on a Web site by typing the word or
phrase into a query form and clicking the button to execute the query (for
example, the Execute Query button on the sample query form). This section covers
the following topics:
Searches produce a list of files that contain the word or phrase no matter
where they appear in the text. This list gives the rules for formulating
queries:
- Consecutive words are treated as a phrase; they must appear in the same
order within a matching document.
- Queries are not case-sensitive, so you can type your query in uppercase or
lowercase.
- You can search for any word except for those in the exception list (for
English, this includes a, an, and, as,
and other common words), which are ignored during a search.
- Words in the exception list are treated as placeholders in phrase and
proximity queries. For example, if you searched for “Word for Windows,”
the results could give you “Word for Windows” and “Word and
Windows,” because for is a noise word and appears in the
exception list.
- Punctuation marks such as the period (.), colon (:), semicolon (;), and
comma (,) are ignored during a search.
- To use specially treated characters such as &, |, ^, #, @, $, (, ), in
a query, enclose your query in quotation marks (“).
- To search for a word or phrase containing quotation marks, enclose the
entire phrase in quotation marks and then double the quotation marks around
the word or words you want to surround with quotes. For example,
“World Wide Web or ““Web””” searches for World Wide Web or
“Web.”
- You can insert Boolean operators (AND,
OR, and NOT) and the proximity
operator (NEAR) to specify additional search
information.
- The wildcard character (*) can match words with a
given prefix. The query esc* matches the terms “ESC,” “escape,” and
so on.
- Free-text queries can be specified without
regard to query syntax.
- Vector space queries can be specified.
- ActiveX™ (OLE) and file attribute property
value queries can be issued.
Boolean and proximity operators can create a more precise query.
| To Search
For |
Example |
Results |
| Both terms in the same page |
access and basic
—Or—
access & basic |
Pages with both the words “access” and “basic” |
| Either term in a page |
cgi or isapi
—Or—
cgi | isapi |
Pages with the words “cgi” or “isapi” |
| The first term without the second term |
access and not basic
—Or—
access & ! basic |
Pages with the word “access” but not “basic” |
| Pages not matching a property value |
not @size = 100
—Or—
! @size = 100 |
Pages that are not 100 bytes |
| Both terms in the same page, close together |
excel near project
—Or—
excel ~ project |
Pages with the word “excel” near the word
“project” |
Hints:
- You can add parentheses to nest expressions within a query. The
expressions in parentheses are evaluated before the rest of the query.
- Use double quotes (“) to indicate that a Boolean or NEAR
operator keyword should be ignored in your query. For example, “Abbott and
Costello” will match pages with the phrase, not pages that match the
Boolean expression. In addition to being an operator, the word and
is a noise word in English.
- The NEAR operator is similar to the AND
operator in that NEAR returns a match if both words being
searched for are in the same page. However, the NEAR
operator differs from AND because the rank assigned by NEAR
depends on the proximity of words. That is, the rank of a page with the
searched-for words closer together is greater than or equal to the rank of a
page where the words are farther apart. If the searched-for words are more
than 50 words apart, they are not considered near enough, and the page is
assigned a rank of zero.
- The NOT operator can be used only after an AND
operator in content queries; it can be used only to exclude pages that match
a previous content restriction. For property value queries, the NOT
operator can be used apart from the AND operator.
- The AND operator has a higher precedence than OR.
For example, the first three queries are equal, but the fourth is not:
a AND
b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note: The symbols (&, |, !, ~) and the
English keywords AND, OR, NOT,
and NEAR work the same way in all languages supported by Index
Server. Localized keywords are also available when the browser locale is set to
one of the following six languages:
| Language |
Keywords |
| German |
UND, ODER, NICHT,
NAH |
| French |
ET, OU, SANS,
PRES |
| Spanish |
Y, O, NO,
CERCA |
| Dutch |
EN, OF, NIET,
NABIJ |
| Swedish |
OCH, ELLER, INTE,
NÄRA |
| Italian |
E, O, NO, VICINO |
Note: The NEAR operator can be applied only
to words or phrases.
Wildcard operators help you find pages containing words
similar to a given word.
The query engine finds pages that best match the
words and phrases in a free-text query. This is done by automatically finding
pages that match the meaning, not the exact wording, of the query. Boolean,
proximity, and wildcard operators are ignored within a free-text query.
Free-text queries are prefixed with $contents.
The query engine supports vector space queries. Vector queries return pages
that match a list of words and phrases. The rank of each page indicates how well
the page matched the query.
| To Search
For |
Example |
Results |
| Pages that contain specific words |
light, bulb |
Files with words that best match the words being searched
for |
| Pages that contain weighted prefixes, words, and phrases |
invent*, light[50],
bulb[10], "light bulb"[400] |
Files that contain words prefixed by “invent,” the
words “light,” “bulb,” and the phrase “light bulb” (the
terms are weighted) |
- Components in vector queries are separated by commas.
- Components in vector queries can be weighted by using the [weight] syntax.
- Pages returned by vector queries do not necessarily match every term in
the query.
- Vector queries work best when the results are sorted by rank.
With property value queries, you can find files that have property values
that match a given criteria. The properties over which you can query include
basic file information like file name and file size, and ActiveX properties
including the document summary (information) that is stored in files created by
ActiveX-aware applications.
There are two types of property queries:
- Relational property queries
consist of an “at” character (@), a property
name, a relational operator, and a property
value. For example, to find all of the files larger than 1 million
bytes, issue the query @size > 1,000,000.
- Regular expression property queries consist of a number sign (#),
a property name, and a regular expression
for the property value. For example, to find all of the video (.avi)
files, issue the query #filename *.avi. Regular expressions will never match
the special properties contents (#contents) and all (#all). Properties that
are not retrievable at query time cannot be used in # queries. These include
HTML META properties not stored in the property cache.
This section covers the following topics:
Property names are preceded by either the “at” (@) or number sign (#)
character. Use @ for relational queries and # for regular expression queries.
If no property name is specified, @ contents is assumed.
Properties available for all files include:
| Property
Name |
Description |
| All |
Matches words, phrases, and any property |
| Contents |
Words and phrases in the file |
| Filename |
Name of the file |
| Size |
File size |
| Write |
Last time the file was modified |
ActiveX property values can also be used in queries. Web sites with files
created by most ActiveX-aware applications can be queried for these properties:
| Property
Name |
Description |
| DocTitle |
Title of the document |
| DocSubject |
Subject of the document |
| DocAuthor |
The document’s author |
| DocKeywords |
Keywords for the document |
| DocComments |
Comments about the document |
For a complete list of property names, see the List
of Property Names later on this page.
Relational operators are used in relational property queries.
| To Search
For |
Example |
Results |
| Property values in relation to a fixed value |
@size < 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files whose size matches the query |
| Property values with all of a set of bits on |
@attrib ^a 0x820 |
Compressed files with the archive bit on |
| Property values with some of a set of bits on |
@attrib ^s 0x20 |
Files with the archive bit on |
| To Search
For |
Example |
Results |
| A specific value |
@DocAuthor = Bill
Barnes |
Files authored by “Bill Barnes” |
| Values beginning with a prefix |
#DocAuthor George* |
Files whose author property begins with “George” |
| Files with any of a set of extensions |
#filename *.|(exe|,dll|,sys|) |
Files with .exe, .dll, or .sys extensions |
| Files modified after a certain date |
@write > 96/2/14
10:00:00 |
Files modified after February 14, 1996, at 10:00 GMT |
| Files modified after a relative date |
@write > -1d2h |
Files modified in the last 26 hours |
| Vectors matching a vector |
@vectorprop = { 10,
15, 20 } |
ActiveX documents with a vectorprop value of { 10, 15, 20
} |
| Vectors where each value matches a criteria |
@vectorprop >^a 15 |
ActiveX documents with a vectorprop value in which all
values in the vector are greater than 15 |
| Vectors where at least one value matches a criteria |
@vectorprop =^s 15 |
ActiveX documents with a vectorprop value in which at
least one value is 15 |
- Be sure to use the pound (#) character before the property name when using
a regular expression in a property value and an “at” (@) character
otherwise. The equal (=) relational operator is assumed for
regular-expression queries.
- File name (#filename) is the only property that efficiently supports
regular expressions with wildcards to the left of text.
- Date and time values are of the form yyyy/mm/dd hh:mm:ss or yyyy-mm-dd
hh:mm:ss. The first two characters of the year and the entire time can
be omitted. If you omit the first two characters of the year, then 29 or
less is interpreted as the year 2000, and 30 or greater is interpreted as
the year 1900. All dates and times are in Greenwich Mean Time (GMT).
- Dates and times relative to the current time can be expressed with a minus
(-) character followed by zero or by more integer unit and time unit pairs.
Time units are expressed as: (y) for years, (m) for months, (w) for weeks,
(d) for days, (h) for hours, (n) for minutes, and (s) for seconds. A
three-digit millisecond value can be optionally specified after the seconds
value in date expressions. For example, 1997/12/8 10:10:03:452.
- Currency values are of the form x.y, where x is the
whole value amount and y is the fractional amount. There is no
assumption about units.
- Boolean values are (t) or (true) for TRUE and (f) or
(false) for FALSE.
- Vectors (VT_VECTOR) are expressed as an opening brace ({), followed by a
comma-separated list of values, then a closing brace (}).
- Single-value expressions that are compared against vectors are expressed
as a relational operator, then a (^a) for
all of or a (^s) for some of.
- Numeric values can be in decimal or hexadecimal (preceded by 0x).
- The contents property does not support relational operators. If a
relational operator is specified, no results will be found. For example,
@contents Microsoft will find documents containing Microsoft, but @contents=Microsoft
will find none.
Regular expressions in property queries are defined as follows:
- Any character except asterisk (*), period (.), question mark (?), and
vertical bar (|) defaults to matching just itself.
- Regular expressions can be enclosed in matching quotes (“), and must be
enclosed in quotes if they contain a space ( ) or closing parenthesis ()).
- The characters *, ., and ? behave as they behave in Windows; they match
any number of characters, match (.) or end of string, and match any one
character, respectively.
- The character | is an escape character. After |, the following characters
have special meaning:
( opens a group. Must be followed by a matching ).
) closes a group. Must be preceded by a matching (.
[ opens a character class. Must be followed by a matching (un-escaped) ].
{ opens a counted match. Must be followed by a matching }.
} closes a counted match. Must be preceded by a matching {.
, separates OR clauses.
* matches zero or more occurrences of the preceding expression.
? matches zero or one occurrences of the preceding expression.
+ matches one or more occurrences of the preceding expression.
Anything else, including |, matches itself.
- Between square brackets ([]) the following characters have special
meaning:
^ matches everything but following classes. Must be the first character.
] matches ]. May only be preceded by ^, otherwise it closes the class.
- range operator. Preceded and followed by normal characters.
Anything else matches itself (or begins or ends a range at itself).
- Between curly braces ({}) the following syntax applies:
|{m|} matches exactly m occurrences of the preceding expression.
(0 < m < 256).
|{m,|} matches at least m occurrences of the preceding
expression. (1 < m < 256).
|{m,n|} matches between m and n occurrences of the
preceding expression, inclusive. (0 < m < 256, 0 < n < 256).
- To match *, ., and ?, enclose them in brackets (for example, |[*]sample
will match “*sample”).
| Example |
Results |
@size > 1000000 |
Pages larger than 1 million bytes |
@write > 95/12/23 |
Pages modified after the date |
Apple tree |
Pages with the phrase “apple tree” |
"apple
tree" |
Same as above |
@contents apple tree |
Same as above |
Microsoft and @size
> 1000000 |
Pages with the word “Microsoft” that are larger than
1 million bytes |
"microsoft and
@size > 1000000" |
Pages with the phrase specified (not the same as above) |
#filename *.avi |
Video files (the # prefix is used because the query
contains a regular expression) |
@attrib ^s 32 |
Pages with the archive attribute bit on |
@docauthor = John
Smith |
Pages with the given author |
$contents why is the
sky blue? |
Pages that match the query |
@size < 100 &
#filename *.gif |
Graphics Interchange Format (GIF) files less than 100
bytes in size |
These properties are always available for queries. Additional properties may
also be available depending on the configuration of the Web server.
| Friendly
Name |
Datatype |
Property |
| A_HRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML HREF. This property name was
created for Microsoft® Site Server and corresponds with the Index
Server property name HtmlHRef. Can be queried but not retrieved. |
| Access |
VT_FILETIME |
Last time file was accessed. |
| All |
(not applicable) |
Searches every property for a string. Can
be queried but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size of disk allocation for file. |
| Attrib |
DBTYPE_UI4 |
File attributes. Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class ID of object, for example, WordPerfect,
Word, and so on. |
| Characterization |
DBTYPE_WSTR | DBTYPE_BYREF |
Characterization, or abstract, of document.
Computed by Index Server. |
| Contents |
(not applicable) |
Main contents of file. Can be queried but
not retrieved. |
| Create |
VT_FILETIME |
Time file was created. |
| Directory |
DBTYPE_WSTR | DBTYPE_BYREF |
Physical path to the file, not including the
file name. |
| DocAppName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of application that created the file. |
| DocAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Author of document. |
| DocByteCount |
DBTYPE_14 |
Number of bytes in a document. |
| DocCategory |
DBTYPE_STR | DBTYPE_BYREF |
Type of document such as a memo, schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number of characters in document. |
| DocComments |
DBTYPE_WSTR | DBTYPE_BYREF |
Comments about document. |
| DocCompany |
DBTYPE_STR | DBTYPE_BYREF |
Name of the company for which the document was written. |
| DocCreatedTm |
VT_FILETIME |
Time document was created. |
| DocEditTime |
VT_FILETIME |
Total time spent editing document. |
| DocHiddenCount |
DBTYPE_14 |
Number of hidden slides in a Microsoft® PowerPoint document. |
| DocKeywords |
DBTYPE_WSTR | DBTYPE_BYREF |
Document keywords. |
| DocLastAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Most recent user who edited document. |
| DocLastPrinted |
VT_FILETIME |
Time document was last printed. |
| DocLastSavedTm |
VT_FILETIME |
Time document was last saved. |
| DocLineCount |
DBTYPE_14 |
Number of lines contained in a document. |
| DocManager |
DBTYPE_STR | DBTYPE_BYREF |
Name of the manager of the document’s author. |
| DocNoteCount |
DBTYPE_14 |
Number of pages with notes in a PowerPoint document. |
| DocPageCount |
DBTYPE_I4 |
Number of pages in document. |
| DocParaCount |
DBTYPE_14 |
Number of paragraphs in a document. |
| DocPartTitles |
DBTYPE_STR | DBTYPE_VECTOR |
Names of document parts. For example, in Excel part titles are the
names of spread sheets, in PowerPoint slide titles, and in Word for
Windows the names of the documents in the master document. |
| DocPresentationTarget |
DBTYPE_STR|DBTYPE_BYREF |
Target format (35mm, printer, video, and so on) for a presentation in
PowerPoint. |
| DocRevNumber |
DBTYPE_WSTR | DBTYPE_BYREF |
Current version number of document. |
| DocSlideCount |
DBTYPE_14 |
Number of slides in a PowerPoint document. |
| DocSubject |
DBTYPE_WSTR | DBTYPE_BYREF |
Subject of document. |
| DocTemplate |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of template for document. |
| DocTitle |
DBTYPE_WSTR | DBTYPE_BYREF |
Title of document. |
| DocWordCount |
DBTYPE_I4 |
Number of words in document. |
| FileIndex |
DBTYPE_I8 |
Unique ID of file. |
| FileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of file. |
| HitCount |
DBTYPE_I4 |
Number of hits (words matching query) in
file. |
| HtmlHRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML HREF. Can be queried but not
retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H1. Can be
queried but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H2. Can be
queried but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H3. Can be
queried but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H4. Can be
queried but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H5. Can be
queried but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML document in style H6. Can be
queried but not retrieved. |
| Img_Alt |
DBTYPE_WSTR | DBTYPE_BYREF |
Alternate text for <IMG> tags. Can
be queried but not retrieved. |
| Path |
DBTYPE_WSTR | DBTYPE_BYREF |
Full physical path to file, including file
name. |
| Rank |
DBTYPE_I4 |
Rank of row. Ranges from 0 to1,000. Larger
numbers indicate better matches. |
| RankVector |
DBTYPE_I4 | DBTYPE_VECTOR |
Ranks of individual components of a vector
query. |
| ShortFileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Short (8.3) file name. |
| Size |
DBTYPE_I8 |
Size of file, in bytes. |
| USN |
DBTYPE_I8 |
Update Sequence Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR | DBTYPE_BYREF |
Full virtual path to file, including file
name. If more than one possible path, then the best match for the
specific query is chosen. |
| WorkId |
DBTYPE_I4 |
Internal ID for file. Used within Index
Server. |
| Write |
VT_FILETIME |
Last time file was written. |
To define properties that are not in the previous list, you must list them in
a [Names] section in the .idq file. To use these properties in a restriction,
sort specification, or as a retrieved column, you have define them in the .idq
file, using the following format:
[Names]
#Properties that are not in the standard list
Propertyname ( Datatype ) = GUID ["Name"
| propid]
In the syntax, "Name" is the property name ("Sales"
in the following example), and propid is the property ID in
hexadecimal. Note that you need to surround the friendly name with quotation
marks, but the property ID does not take quotation marks.
For example, suppose you want to define an HTML meta tag as a property name
that somebody can search for. The property you want to define is Sales.
To define the Sales property
- In the .idq file, under the [Names] section, add the following line.
MetaDescription(DBTYPE_WSTR) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1
"Sales"
The GUID number comes from the MetaTagClsid parameter in
the registry, at the following location:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\HtmlFilter
\MetaTagClsid
- Then, in the HTML files where you want the tag to appear, define the meta
description.
For example, say you want to search for all files that give sales
projections for the future:
In File1.htm:
<META NAME="Sales" CONTENT="Projections for
1998">
In File2.htm:
<META NAME="Sales" CONTENT="Projections for
1999">
In File3.htm:
<META NAME="Sales" CONTENT="Sales in 1997">
Note: Be sure to add your META NAME tags
between the <head> and </head> HTML tags at the beginning of the
file.
You can now search for all files that show sales projections. Send the
following query:
@metadescription projections
This query returns all the files with the word projections in the
CONTENT field of the meta tag. In this example, File1.htm and File2.htm are
returned.
But suppose you want to search for sales by year, for example a list of sales
in 1997. Send the following query:
@metadescription 1997
File3.htm is returned.
|