- Clean URLs: search/type/keywords e.g. "search/node/drupal release". The search
form is POST submitted, but drupal_gotos to a GET page. This makes it easy to
copy/paste search URLs, and makes the pager a lot cleaner.
- Remember the search keywords when switching between the search tabs. This is
done through the same GET URLs rather than the session, so it does not mess up
between multiple browser tabs.
- Report which keywords were ignored because they were too short.
- #820: Provide search block
- Treat multiple wildcards in a row as one
1) The different types of search, which used to be radio button options in the search form, are now subtabs of "search" (default "search/node"). This seems better from a UI point of view, but also has another advantage: modules which implement a custom search form (flexinode, project) can add it as a subtab of search. This means that all search forms will be located in the same place, and also without needing an extra api call to search.module.
2) The current code was a bit hackish, as the indexing of comments along with nodes was hardcoded in node.module. Instead, I created a nodeapi operation "update index" which allows modules to add more data for a node that is being indexed. Comments are now indexed using this mechanism and from comment.module, which is a lot cleaner.
3) The search results format was also hardcoded to include "N comments". I replaced this with a nodeapi operation "search result" and moved the comment code to comment.module where it belongs. This op is quite useful, as for example I also modified upload.module to add "N attachments" to a search result if any are present.
* Less logic in theme code.
* Encourages use of the menu system.
* Easier to find where a title or breadcrumb comes from in other people's code because there are less places to look. Look in menu and then grep for the appropriate set function. Looking for calls to theme_page() is hard because there are too many of them.
* Very slightly more efficient.
db_query($query, $a, $b, $c);
db_query($query, array($a, $b, $c));
This usage is particularly interesting when the query is constructed dynamically, and the amount of arguments to pass varies. In that case we use the second method to avoid using call_user_func_array(). This behaviour is not documented explicitly, but it is used in several places.
However, db_query_range() and pager_query() do not support this syntax properly, which means there are several pieces of code which still revert to the ugly call_user_func_array() call.
This patch updates db_query_range() and pager_query() so they support the array-passing method. I also added documentation about this method to each of the db functions.
I also cleaned up the code for db_query (it was weird and hard to understand) and moved db_query() and db_queryd() from database.xxxxx.inc to database.inc: it was the same between both mysql and pgsql, as it doesn't do anything database specific. It just prefixes the tables and inserts the arguments. The actual db query is performed in _db_query(), which is still in database.xxxxx.inc.
Finally, I updated several places with the new syntax, and the code is a lot cleaner. For example:
- array_unshift($params, "SELECT u.* FROM {users} u WHERE $query u.status < 3");
- $params[] = 0;
- $params[] = 1;
- $result = call_user_func_array('db_query_range', $params);
+ $result = db_query_range("SELECT u.* FROM {users} u WHERE $query u.status < 3", $params, 0, 1);
and
- return call_user_func_array('db_query_range', array_merge(array($query), $args, array((int)$pager_from_array[$element], (int)$limit)));
+ return db_query_range($query, $args, (int)$pager_from_array[$element], (int)$limit);
I've tested it on mysql. I didn't alter the actual db behaviour, so pgsql should be okay too.
This patch is important because many people avoid the call_user_func_array() method and put data directly into the db query. This is very, very bad because the database prefix will be applied to it, and strip out braces. It's also generally bad form as you have to call check_query() yourself. With the new, documented syntax, there is no more excuse to put data directly in the query.
+ When a comment is posted, a node needs to be re-indexed. Luckily, we can use node_comment_statistics for this easily.
+ When a node is deleted, it should be deleted from the search index as well.
+ The search wipe didn't properly remove links to nodes from the index.
+ Section url was faulty in _help.
+ Minor code rearrangement.
+ Display 'friendly' name rather than module name in search watchdog
messages.
+ Remove left-over from search_total table.
+ Add index wipe button to the admin
+ Moved the admin to admin/settings/search
+ Prevented menu bug when node modules update the breadcrumb in view
(thanks JonBob).
+ Changed search_total table's word key to PRIMARY.
1) Clean up the text analyser: make it handle UTF-8 and all sorts of characters. The word splitter now does intelligent splitting into words and supports all Unicode characters. It has smart handling of acronyms, URLs, dates, ...
2) It now indexes the filtered output, which means it can take advantage of HTML tags. Meaningful tags (headers, strong, em, ...) are analysed and used to boost certain words scores. This has the side-effect of allowing the indexing of PHP nodes.
3) Link analyser for node links. The HTML analyser also checks for links. If they point to a node on the current site (handles path aliases) then the link's words are counted as part of the target node. This helps bring out commonly linked FAQs and answers to the top of the results.
4) Index comments along with the node. This means that the search can make a difference between a single node/comment about 'X' and a whole thread about 'X'. It also makes the search results much shorter and more relevant (before this patch, comments were even shown first).
5) We now keep track of total counts as well as a per item count for a word. This allows us to divide the word score by the total before adding up the scores for different words, and automatically makes noisewords have less influence than rare words. This dramatically improves the relevancy of multiword searches. This also makes the disadvantage of now using OR searching instead of AND searching less problematic.
6) Includes support for text preprocessors through a hook. This is required to index Chinese and Japanese, because these languages do not use spaces between words. An external utility can be used to split these into words through a simple wrapper module. Other uses could be spell checking (although it would have no UI).
7) Indexing is now regulated: only a certain amount of items will be indexed per cron run. This prevents PHP from running out of memory or timing out. This also makes the reindexing required for this patch automatic. I also added an index coverage estimate to the search admin screen.
8) Code cleanup! Moved all the search stuff from common.inc into search.module, rewired some hooks and simplified the functions used. The search form and results now also use valid XHTML and form_ functions. The search admin was moved from search/configure to admin/search for consistency.
9) Improved search output: we also show much more info per item: date, author, node type, amount of comments and a cool dynamic excerpt à la Google. The search form is now much more simpler and the help is only displayed as tips when no search results are found.
10) By moving all search logic to SQL, I was able to add a pager to the search results. This improves usability and performance dramatically.